
gemini-2.5-flash-lite
The lightweight model in the Gemini 2.5 family—fastest, lowest cost, and equipped with multimodal processing capabilities.
2025-07-23
Input:
$0.1/1M tokens
Output:
$0.4/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Basic Information
- Developer: Google DeepMind; a member of the Gemini 2.5 model family, with the stable version released and fully available on July 22, 2025.
- Positioning: The fastest and most cost-effective model in the Gemini 2.5 series, optimized for high-throughput, low-latency scenarios.
- Access Method: Developers can invoke it by specifying “gemini-2.5-flash-lite” in their code, supporting Google AI Studio and Vertex AI platforms.
Core Features
- Cost Efficiency: Input tokens priced at $0.10 per million, output tokens at $0.40 per million; audio input prices are 40% lower than in the preview version.
- Low Latency: Latency is lower than that of Gemini 2.0 Flash-Lite and 2.0 Flash; in a certain automotive diagnostic scenario, latency is reduced by 45%, and power consumption is lowered by 30%.
- Superior Performance: Outperforms 2.0 Flash-Lite in multiple benchmarks—for example, in the AIME 2025 math test, the reasoning pattern score is 63.1%, surpassing the 29.7% achieved by 2.0 Flash.
- Full Functionality: Supports a context window of up to 1 million tokens, as well as native tools such as thought budget control, Google Search grounding, and code execution.
Technical Highlights
- Controllable Reasoning Mode: Allows enabling inference capabilities on demand; in the reasoning mode, LiveCodeBench code generation scores 34.3%, compared to 33.7% in non-reasoning mode.
- Multi-modal Capabilities: MMMU visual reasoning score is 72.9%, higher than the 69.3% of 2.0 Flash; Vibe-Eval image understanding reasoning mode scores 57.5%.
- Long Context Processing: In the 128k average-length MRCR v2 test, the reasoning mode score is 30.6%, outperforming the 19.0% of 2.0 Flash.
Market Impact
- Completes Product Matrix: As the lowest-cost model in the 2.5 series, it facilitates the deployment of large-scale production-level applications.
- Empowers Enterprises to Reduce Costs: With its low pricing and low power consumption features, it helps enterprises control costs when handling massive request volumes.
- Improves Development Efficiency: Its rapid response capability enables developers to efficiently build dynamic insight-driven applications.
Application Scenarios
- Latency-Sensitive Tasks: High-frequency scenarios such as translation and classification—leveraging its low-latency advantage to enhance response speed.
- Code Development: Supports UI code writing and multi-language code editing; SWE-bench Verified multi-round attempts achieve a score of 44.9%.
- Data Processing: Enables quick scanning of massive outputs or conversion of large PDFs into interactive web applications.
- Domain-Specific Diagnostics: For instance, in automotive diagnostic scenarios, it delivers low-latency, low-power fault analysis.
Related Evaluations:
“Gemini-2.5-pro vs. Claude-3.7-Sonnet Frontend Programming Capability Real-World Duel”
Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (4)
API Pricing
$¥ 円 ₽