MiniMax-M3

MiniMax-M3

MiniMax’s new-generation trillion-parameter MoE multimodal flagship large model.
2026-06-01
LLM
Model capability: audioModel capability: imageModel capability: videoModel capability: thinkingModel capability: function_call
Input:
$0.6/1M tokensstarting from
Output:
$2.4/1M tokensstarting from
Bulk order? Contact your manager for exclusive deals

API Overview

MiniMax-M3 is a next-generation, trillion-parameter MoE (Mixture of Experts architecture) flagship multimodal large model. The model achieves a leapfrog upgrade in metrics such as complex tool invocation, massive long-text processing, and high-concurrency production deployment. As a native, all-in-one multimodal model, M3 not only rivals the world’s leading cutting-edge models in multilingual text and long-range logical reasoning, but also demonstrates outstanding performance in native multimodal streaming interaction (end-to-end integration of speech, video, and text), specifically designed for next-generation hyper-realistic intelligent agents and enterprise-level core business applications.

───────────────────────────────────────────────────────────────────

Core Capabilities


Million-Token-Level Contextual Retrieval—Like Finding a Needle in a Haystack: By default, it supports an ultra-long context window of up to 1 million tokens, maintaining a 100% lossless information recall rate in the industry-recognized “Needle In A Haystack” benchmark. It can effortlessly handle entire technical monographs, tens of thousands of lines of enterprise-level cross-file codebases, or hours-long meeting recordings, performing precise logical extraction and vulnerability audits.

Native End-to-End Multimodal Interaction: It adopts a true omnimodal (Omni) fusion architecture—not the traditional “text-plus-speech plugin” stitching approach. It supports end-to-end streaming input and output of text, ultra-realistic speech (including emotional expression, breathing sounds, and dialect metaphor control), and visual images. It delivers ultra-low latency at the millisecond level in real-time speech practice, bidirectional audio-video interaction, and multimodal content generation scenarios.

Massive Tool Manipulation and Complex Planning: Addressing the pain points of large-model “hallucinations” and fragmented complex task execution, M3 has deeply enhanced its tool-call and long-term task-planning capabilities. It enables precise orchestration and invocation of hundreds or even thousands of enterprise-grade private APIs within a single workflow, perfectly handling highly complex production tasks such as automated financial audits and multi-platform cross-border supply-chain collaboration.

Ultra-High-Concurrency Enterprise Deployment: Thanks to a highly optimized MoE dynamic routing algorithm and self-developed high-performance inference operators, while maintaining flagship-level output quality, M3 reduces first-token latency (TTFT) by 40% compared to the previous generation, significantly boosting overall throughput. It perfectly supports ultra-high-concurrency, high-frequency business lines, providing enterprises with exceptional output stability and unbeatable cost-effectiveness for production readiness.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(Minimax)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

MiniMax-M3

≤512K
1000000

Input$0.6 / 1M tokens
Output$2.4 / 1M tokens

Input$0.6/ 1M tokens
Output$2.4/ 1M tokens
Original Price

MiniMax-M3

512K-1M
1000000

Input$1.2 / 1M tokens
Output$4.8 / 1M tokens

Input$1.2/ 1M tokens
Output$4.8/ 1M tokens
Original Price