
gemini-2.5-flash-lite-preview-09-2025
Preview version further iterated and optimized based on Gemini-2.5-flash-lite
2025-09-26
Input:
$0.1/1M tokens
Output:
$0.4/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Basic Information
- Developer: Google DeepMind, part of the Gemini 2.5 model family, with a stable version released and fully available on July 22, 2025.
- Positioning: The fastest and most cost-effective model in the Gemini 2.5 series, optimized for high-throughput, low-latency scenarios.
- Access Method: Developers can call it by specifying “gemini-2.5-flash-lite” in their code, supporting Google AI Studio and Vertex AI platforms.
Core Features
- Cost Efficiency: Input tokens priced at $0.10 per million, output tokens at $0.40 per million; audio input prices are 40% lower than in the preview version.
- Low Latency: Latency is lower than that of Gemini 2.0 Flash-Lite and 2.0 Flash; in a certain automotive diagnostic scenario, latency is reduced by 45%, and power consumption is lowered by 30%.
- Superior Performance: Outperforms 2.0 Flash-Lite in multiple benchmarks—for instance, in the AIME 2025 math test, the reasoning pattern score is 63.1%, compared to 29.7% for 2.0 Flash.
- Full Functionality: Supports a context window of up to 1 million tokens, as well as native tools such as thought budget control, Google Search grounding, and code execution.
Technical Highlights
- Controllable Reasoning Mode: Allows enabling inference capabilities on demand; in the reasoning mode, LiveCodeBench code generation scores 34.3%, while in non-reasoning mode it scores 33.7%.
- Multi-modal Capabilities: MMMU visual reasoning score is 72.9%, higher than 2.0 Flash’s 69.3%; Vibe-Eval image understanding reasoning mode score is 57.5%.
- Long Context Processing: In the 128k average-length MRCR v2 test, the reasoning mode score is 30.6%, outperforming 2.0 Flash’s 19.0%.
Market Impact
- Completes Product Matrix: As the lowest-cost model in the 2.5 series, it facilitates the deployment of large-scale production-level applications.
- Empowers Enterprises to Reduce Costs: With its low pricing and low power consumption features, it helps enterprises control costs when handling massive request volumes.
- Improves Development Efficiency: Its rapid response capability enables developers to efficiently build dynamic insight-driven applications.
Application Scenarios
- Latency-Sensitive Tasks: High-frequency scenarios such as translation and classification, leveraging its low-latency advantage to enhance response speed.
- Code Development: Supports UI code writing and multi-language code editing; SWE-bench Verified multi-round attempts achieve a score of 44.9%.
- Data Processing: Enables quick scanning of massive outputs or conversion of large PDFs into interactive web applications.
- Domain-Specific Diagnostics: For example, in automotive diagnostic scenarios, it delivers low-latency, low-power fault analysis.
Related Evaluations:
“Gemini-2.5-pro vs. Claude-3.7-Sonnet Frontend Programming Capability Real-World Duel”
Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (4)
API Pricing
$¥ 円 ₽