
gemini-2.5-flash-nothink
gemini-2.5-flash forcibly shuts down thinking.
2025-06-24
Input:
$0.3/1M tokens
Output:
$2.5/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Basic Information
Gemini 2.5 Flash is a high-performance model version within the Gemini 2.X model series, optimized for cost-to-latency ratio. Developed by Google DeepMind, this series focuses on achieving “excellent reasoning capabilities combined with lower computational and latency requirements.” In the Gemini 2.X series, the Flash version is positioned as a hybrid model designed to “control reasoning budgets,” balancing multimodal understanding with response efficiency.
Core Features
- Supports multimodal inputs, including text, images, audio, video, and more, enabling cross-media comprehension of complex content.
- Features long-context processing capability, capable of handling input scenarios exceeding one million tokens—enabling it to process entire books, code repositories, or even videos lasting several hours.
- A carefully designed “Thinking Budget” mechanism allows users and systems to flexibly adjust the model’s computational budget for reasoning, striking a balance between quality, cost, and speed.
- Targeted at real-world application scenarios, the Flash version maintains strong reasoning capabilities while placing greater emphasis on low latency and reduced computational resource usage, making it particularly suitable for cost-sensitive or real-time applications.
Technical Highlights
- The model adopts a sparse mixture-of-experts (MoE) transformer architecture, decoupling model capacity from computational cost, thereby making the Flash version significantly more efficient in terms of resource utilization.
- Major improvements have been made across pre-training, fine-tuning, and reinforcement learning (RL) stages—for example, larger-scale training foundations, improved data filtering and deduplication, and richer multimodal training samples—thus enhancing the model’s general comprehension, tool invocation, and reasoning-chain capabilities.
- Significant performance gains have been achieved in multimodal tasks such as video understanding, audio generation, and long-text reasoning: for instance, the Flash version has surpassed the full capabilities of the Gemini 1.5 series, delivering breakthrough progress in reasoning, encoding, multilingual support, and multimedia comprehension.
- Through the “thinking” mechanism, the model can dynamically allocate computational budgets during reasoning and autonomously determine the duration of thought processes, achieving higher accuracy in complex tasks. The Flash version retains this mechanism but optimizes it for lower latency and cost, making it ideal for fast-response scenarios.
Note: Native Gemini format calls are now supported.
Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (3)
API Pricing
$¥ 円 ₽