gemini-2.5-flash-lite

gemini-2.5-flash-lite

The lightweight model in the Gemini 2.5 family—fastest, lowest cost, and equipped with multimodal processing capabilities.
2025-07-23
LLM
Model capability: imageModel capability: function_call
Input:
$0.1/1M tokens
Output:
$0.4/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Basic Information

  • Developer: Google DeepMind; a member of the Gemini 2.5 model family, with the stable version released and fully available on July 22, 2025.
  • Positioning: The fastest and most cost-effective model in the Gemini 2.5 series, optimized for high-throughput, low-latency scenarios.
  • Access Method: Developers can invoke it by specifying “gemini-2.5-flash-lite” in their code, supporting Google AI Studio and Vertex AI platforms.

Core Features

  • Cost Efficiency: Input tokens priced at $0.10 per million, output tokens at $0.40 per million; audio input prices are 40% lower than in the preview version.
  • Low Latency: Latency is lower than that of Gemini 2.0 Flash-Lite and 2.0 Flash; in a certain automotive diagnostic scenario, latency is reduced by 45%, and power consumption is lowered by 30%.
  • Superior Performance: Outperforms 2.0 Flash-Lite in multiple benchmarks—for example, in the AIME 2025 math test, the reasoning pattern score is 63.1%, surpassing the 29.7% achieved by 2.0 Flash.
  • Full Functionality: Supports a context window of up to 1 million tokens, as well as native tools such as thought budget control, Google Search grounding, and code execution.

Technical Highlights

  • Controllable Reasoning Mode: Allows enabling inference capabilities on demand; in the reasoning mode, LiveCodeBench code generation scores 34.3%, compared to 33.7% in non-reasoning mode.
  • Multi-modal Capabilities: MMMU visual reasoning score is 72.9%, higher than the 69.3% of 2.0 Flash; Vibe-Eval image understanding reasoning mode scores 57.5%.
  • Long Context Processing: In the 128k average-length MRCR v2 test, the reasoning mode score is 30.6%, outperforming the 19.0% of 2.0 Flash.

Market Impact

  • Completes Product Matrix: As the lowest-cost model in the 2.5 series, it facilitates the deployment of large-scale production-level applications.
  • Empowers Enterprises to Reduce Costs: With its low pricing and low power consumption features, it helps enterprises control costs when handling massive request volumes.
  • Improves Development Efficiency: Its rapid response capability enables developers to efficiently build dynamic insight-driven applications.

Application Scenarios

  • Latency-Sensitive Tasks: High-frequency scenarios such as translation and classification—leveraging its low-latency advantage to enhance response speed.
  • Code Development: Supports UI code writing and multi-language code editing; SWE-bench Verified multi-round attempts achieve a score of 44.9%.
  • Data Processing: Enables quick scanning of massive outputs or conversion of large PDFs into interactive web applications.
  • Domain-Specific Diagnostics: For instance, in automotive diagnostic scenarios, it delivers low-latency, low-power fault analysis.


Related Evaluations:

“Ranked Second in the Large Model Ranking! Gemini-2.5-flash-preview-05-20 All-around Comparative Evaluation”

“Gemini-2.5-pro vs. Claude-3.7-Sonnet Frontend Programming Capability Real-World Duel”

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (4)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
v1beta(Official Format - Chat)
POST
Stable
View Details
Chat(Talk)
POST
Stable
View Details
Chat(Analyze image)
POST
Stable
View Details
Chat(Image Generation)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

gemini-2.5-flash-lite

-
1000000

Input$0.1 / 1M tokens
Output$0.4 / 1M tokens

Input$0.1/ 1M tokens
Output$0.4/ 1M tokens
Original Price