
MiniMax-M1
API Overview
MiniMax-M1 is the world’s first open-source large-scale hybrid-architecture inference model launched by MiniMax (Shanghai Xiyu Technology), with a core focus on being a productivity-grade large model offering “millions of tokens in context plus ultra-high cost-effectiveness.”
- Ultra-long context support: The industry’s highest input window at 1 million tokens and an output length of 80,000 tokens—far surpassing mainstream models such as DeepSeek R1.
- Innovative hybrid architecture: Based on the proprietary “Lightning Attention Mechanism,” it requires only about 30% of the computational power needed by DeepSeek R1 for deep reasoning on long texts.
- Efficient training via reinforcement learning: Utilizing the self-developed CISPO algorithm, it achieves twice the convergence speed compared to DAPO, completing the RL phase in just three weeks using 512 H800 GPUs.
- Leading performance in productivity scenarios: Outperforms Gemini 2.5 Pro and most closed-source models in tool usage and engineering tasks such as SWE-bench and TAU-bench.
───────────────────────────────────────────────────────────────────
Core Capabilities
⚡ Million-token-level efficient context processing: Easily handles entire novels, lengthy codebases, or multi-hour meeting transcripts without losing information or slowing down inference.
🧠 Deep tool collaboration capability: Natively understands API calls, code execution, and environmental feedback, enabling high-success-rate closed-loop operations in agent-based tasks.
🧩 Hybrid-architecture compute optimization: Through dynamic sparse computation and attention compression, it significantly reduces deployment costs while maintaining high performance.
🔓 Open-source and commercially usable: Complete model weights are now available on Hugging Face and GitHub, supporting deployment via mainstream frameworks such as vLLM and Transformers.
Playground
Log in to explore more features! Click to Log In