
sophnet/MiMo-V2-Flash
API Overview
MiMo-V2-Flash is an open-source, general-purpose, flagship product launched by Xiaomi, primarily designed for inference, programming, and agent scenarios. It leverages a unique hybrid attention architecture and multi-token prediction technology to deliver top-tier intelligence while achieving ultra-high speed and extremely low costs.
- Ultra-fast Inference: Utilizing native Multi-Token Prediction (MTP) technology, it enables self-optimizing decoding with an inference speed of up to 150 tokens/second, ensuring rapid responses with zero latency.
- Unmatched Cost Efficiency: With exceptionally low API call costs, it stands as one of the most cost-effective high-performance models currently available on the market.
- Industry-Leading Programming Capabilities: Scoring 73.4% on the SWE-bench Verified benchmark, it ranks first among open-source models, approaching the level of GPT-5-High. It supports one-click generation of runnable HTML webpages and complex code.
- Hybrid Thinking Mode: Offers seamless switching between “thinking” and “direct response” modes, enabling it to handle intricate mathematical reasoning as well as engage in smooth, everyday conversations as a general-purpose assistant.
- Long Context Optimization: Featuring a hybrid expert (MoE) architecture with 309B total parameters and 15B activation parameters, paired with a 128-token sliding-window attention mechanism, it perfectly supports ultra-long contexts of up to 256k tokens.
───────────────────────────────────────────────────────────────────
Core Capabilities
💻 Professional-Level Code Generation
Leading the open-source community in benchmarks such as SWE-bench. Supports the Vibe-coding workflow, enabling the one-time generation of complete HTML webpages, operating system interfaces, and multilingual code, effortlessly tackling complex software engineering tasks.
⚡ Ultra-High Speed and Efficiency
Employs MTP technology for parallel decoding. By combining lightweight draft models with verification models, it achieves up to 2.6x effective acceleration without increasing memory bottlenecks, striking a balance between high performance and low cost.
🧠 Powerful Hybrid Reasoning
Allows free switching between “thinking” and “direct response” modes. Based on the MOPD post-training paradigm, it excels in the AIME 2025 math competition and the GPQA-Diamond science knowledge challenge, delivering both deep reasoning capabilities and sub-second response times.
───────────────────────────────────────────────────────────────────
Model Comparison
Playground
Log in to explore more features! Click to Log In