gemini-2.5-flash-preview-09-2025

gemini-2.5-flash-preview-09-2025

Tasks suitable for large-scale processing, low latency, high data volume, and those requiring reasoning, as well as proxy application scenarios.
2025-09-26
LLM
Model capability: imageModel capability: function_call
Input:
$0.3/1M tokens
Output:
$2.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Basic Information

gemini-2.5-flash-preview-09-2025 is a preview version of the Gemini 2.5 Flash series model released by Google on September 25, 2025. The model identifier is gemini-2.5-flash-preview-09-2025. Its knowledge cutoff date is January 2025. It supports multi-modal inputs including text, images, and audio, and stands out for its cost-effectiveness and high performance, making it ideal for developers building complex agent applications and high-throughput scenarios.

Core Features

  • Enhanced Agent Tool Usage: Optimized tool invocation logic has improved the score from 48.9% to 54% in the SWE-bench Verified coding benchmark, significantly boosting the ability to handle multi-step and complex agent tasks.
  • Cost and Efficiency Optimization: When the reasoning mode is enabled, output token consumption is reduced by 24%, lowering latency and invocation costs while maintaining high-quality outputs, making it well-suited for cost-sensitive large-scale applications.
  • Strengthened Multi-modal Capabilities: Improved accuracy in audio transcription, enhanced image understanding, and optimized translation quality enable better handling of cross-modal tasks and meet diverse development needs.

Technical Highlights

  • Dynamic Reasoning Mechanism: Supports adjusting the reasoning budget via parameters, allowing adaptive tuning of inference depth based on task complexity, striking a balance between speed and accuracy without having to compromise between “fast models” and “accurate models.”
  • Sparse Mixture of Experts (MoE) Architecture: Only activates the expert modules that match the task at hand, achieving a balance between large-model capabilities and low computational costs, ensuring million-level context processing capacity while controlling resource consumption.
  • Strong Ecosystem Compatibility: Seamlessly integrates with Google’s developer ecosystem tools, facilitating rapid deployment of agent workflows. Early testing feedback shows a 15% performance improvement in long-term agent tasks, helping accelerate scalable application deployment.

Note: Currently supports calling via Gemini’s native format.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (4)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
v1beta(Official Format - Chat)
POST
Stable
View Details
Chat(Talk)
POST
Stable
View Details
Chat(Analyze image)
POST
Stable
View Details
Chat(Image Generation)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

gemini-2.5-flash

-
1000000

Input$0.3 / 1M tokens
Output$2.5 / 1M tokens

Input$0.3/ 1M tokens
Output$2.5/ 1M tokens
Original Price