sophnet/MiMo-V2-Flash

sophnet/MiMo-V2-Flash

Xiaomi’s open-source general-purpose language model excels in programming and reasoning.
2025-12-12
LLM
Input:
$0.1/1M tokens
Output:
$0.3/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

MiMo-V2-Flash is an open-source, general-purpose, flagship product launched by Xiaomi, primarily designed for inference, programming, and agent scenarios. It leverages a unique hybrid attention architecture and multi-token prediction technology to deliver top-tier intelligence while achieving ultra-high speed and extremely low costs.

  • Ultra-fast Inference: Utilizing native Multi-Token Prediction (MTP) technology, it enables self-optimizing decoding with an inference speed of up to 150 tokens/second, ensuring rapid responses with zero latency.
  • Unmatched Cost Efficiency: With exceptionally low API call costs, it stands as one of the most cost-effective high-performance models currently available on the market.
  • Industry-Leading Programming Capabilities: Scoring 73.4% on the SWE-bench Verified benchmark, it ranks first among open-source models, approaching the level of GPT-5-High. It supports one-click generation of runnable HTML webpages and complex code.
  • Hybrid Thinking Mode: Offers seamless switching between “thinking” and “direct response” modes, enabling it to handle intricate mathematical reasoning as well as engage in smooth, everyday conversations as a general-purpose assistant.
  • Long Context Optimization: Featuring a hybrid expert (MoE) architecture with 309B total parameters and 15B activation parameters, paired with a 128-token sliding-window attention mechanism, it perfectly supports ultra-long contexts of up to 256k tokens.

───────────────────────────────────────────────────────────────────

Core Capabilities

💻 Professional-Level Code Generation

Leading the open-source community in benchmarks such as SWE-bench. Supports the Vibe-coding workflow, enabling the one-time generation of complete HTML webpages, operating system interfaces, and multilingual code, effortlessly tackling complex software engineering tasks.

⚡ Ultra-High Speed and Efficiency

Employs MTP technology for parallel decoding. By combining lightweight draft models with verification models, it achieves up to 2.6x effective acceleration without increasing memory bottlenecks, striking a balance between high performance and low cost.

🧠 Powerful Hybrid Reasoning

Allows free switching between “thinking” and “direct response” modes. Based on the MOPD post-training paradigm, it excels in the AIME 2025 math competition and the GPQA-Diamond science knowledge challenge, delivering both deep reasoning capabilities and sub-second response times.

───────────────────────────────────────────────────────────────────

Model Comparison

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SophNet)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

sophnet/MiMo-V2-Flash

-
256000

Input$0.1 / 1M tokens
Output$0.3 / 1M tokens

Input$0.1/ 1M tokens
Output$0.3/ 1M tokens
Original Price