gemini-2.0-flash-lite

gemini-2.0-flash-lite

The fastest Gemini 2.0 model improves cost-effectiveness and reduces latency.
2025-06-16
LLM
Model capability: image
Input:
$0.075/1M tokens
Output:
$0.3/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Basic Information

Model Name: Gemini 2.0 Flash-Lite (model code: gemini-2.0-flash-lite

Design Purpose: Cost and efficiency optimization for large-scale text generation and high-throughput tasks—this is currently the “most cost-efficient” model variant in the Gemini series.

Core Features

Large Context Window: Supports an input context window of up to 1,048,576 tokens (approximately 1 million tokens), making it ideal for handling large blocks of text or long documents.

Multimodal Input: Supports multiple input types including text, audio, images, and video (though output is text), facilitating integration with diverse content sources.

Structured Output & Function Calling Capability: Supports structured output and API function calls, enabling seamless integration with programmatic systems and backend services.

Cost Optimization & Low Latency: As a key design goal of Flash-Lite, it significantly reduces usage costs and response latency while maintaining reasonable model capabilities, making it suitable for large-scale and frequent invocation scenarios.

Technical Highlights

Superior Performance Compared to Previous Generation: Compared to the previous-generation Gemini 1.5 Flash, Flash-Lite demonstrates better or equally stable performance across multiple benchmarks (reasoning, factuality, math, SQL conversion, etc.), especially well-suited for large-scale text processing and structured tasks.

Simplified Pricing/Billing Model: Both Flash-Lite and 2.0 Flash adopt a “single price per input type” pricing mechanism, eliminating the distinction between short and long-context requests. Compared to earlier versions that handled mixed-context needs, this approach can reduce overall costs in most use cases.

Perfect for Large-Scale, High-Throughput Scenarios: Thanks to its cost and efficiency optimizations, Flash-Lite is particularly well-suited for tasks involving massive text processing, batch content generation, summarization, classification, log processing, search index building, and other similar applications—making it an ideal choice for enterprise-level, batch-processing scenarios.


Note: Native Gemini format calls are now supported


Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (4)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
v1beta(Official Format - Chat)
POST
Stable
View Details
Chat(Talk)
POST
Stable
View Details
Chat(Analyze image)
POST
Stable
View Details
Chat(Image Generation)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

gemini-2.0-flash-lite

-
2000000

Input$0.075 / 1M tokens
Output$0.3 / 1M tokens

Input$0.075/ 1M tokens
Output$0.3/ 1M tokens
Original Price