
xiaomimimo/mimo-v2-flash
API Overview
MiMo-V2-Flash is an open-source, general-purpose, flagship product launched by Xiaomi, primarily designed for inference, programming, and agent scenarios. It leverages a unique hybrid attention architecture and multi-token prediction technology to deliver top-tier intelligence while achieving ultra-high speed and extremely low costs.
- Ultra-fast Inference: Utilizing native Multi-Token Prediction (MTP) technology, it enables self-optimizing decoding, achieving an inference speed of up to 150 tokens per second, with rapid response and zero latency.
- Exceptional Cost-Effectiveness: With extremely low API call costs, it is one of the most cost-efficient high-performance models currently available on the market.
- Industry-Leading Programming Capabilities: Scoring 73.4% on the SWE-bench Verified benchmark, it ranks first among open-source models, approaching the level of GPT-5-High. It supports one-click generation of runnable HTML web pages and complex code.
- Hybrid Thinking Mode: Supports switching between “thinking” and “direct answer” modes, enabling it to handle complex mathematical reasoning as well as engage in smooth, everyday conversations as a general-purpose assistant.
- Long Context Optimization: Employing a hybrid expert (MoE) architecture with 309B total parameters and 15B activation parameters, combined with a 128-token sliding-window attention mechanism, it perfectly supports ultra-long contexts of up to 256k tokens.
───────────────────────────────────────────────────────────────────
Core Capabilities
💻 Professional-Level Code Generation
Leading the open-source community in benchmarks such as SWE-bench. Supporting the Vibe-coding workflow, it can generate complete HTML web pages, operating system interfaces, and multilingual code in a single step, effortlessly tackling complex software engineering tasks.
⚡ Ultra-High Speed and Efficiency
Utilizing MTP technology for parallel decoding. By combining lightweight draft models with verification models, it achieves up to 2.6 times effective acceleration without increasing memory bottlenecks, striking a balance between high performance and low cost.
🧠 Powerful Hybrid Reasoning
Supports free switching between “thinking” and “direct answer” modes. Based on the MOPD post-training paradigm, it excels in the AIME 2025 math competition and the GPQA-Diamond science knowledge challenge, delivering both deep reasoning capabilities and sub-second response times.
───────────────────────────────────────────────────────────────────
Model Comparison
Playground
Log in to explore more features! Click to Log In