
qwen3-max-2026-01-23
API Overview
qwen3-max-2026-01-23 is Alibaba Cloud’s flagship language model. Compared to the snapshot version released on September 23, 2025, this version achieves an effective integration of thinking and non-thinking modes, resulting in a comprehensive and significant improvement in overall model performance. In thinking mode, it simultaneously supports web search, web information extraction, and code interpreter tools, enabling the model, while engaging in slow thinking, to leverage external tools for higher accuracy in tackling more challenging problems. Its core positioning is as a high-level reasoning engine equipped with adaptive tool invocation and runtime expansion capabilities, specifically designed for complex tasks and agent-based scenarios.
- Performance Benchmark: On 19 authoritative benchmarks, it rivals GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro. It achieves a 90.2% win rate on Arena-Hard v2, significantly outperforming others.
- Core Innovation: It uniquely supports adaptive tool invocation (search/memory/code interpreter) and runtime expansion technology, boosting both inference efficiency and accuracy.
- Applicable Scenarios: It is well-suited for advanced AI agent applications such as intelligent programming, scientific research Q&A, multi-hop search, and automated planning.
- Developer-Friendly: Its API is compatible with OpenAI and Anthropic protocols; a single line of configuration allows you to switch models effortlessly and quickly integrate them into existing workflows.
- Measured Improvement: After enabling runtime expansion, the GPQA score rose from 90.3 to 92.8, and HLE (with tools) reached 58.3, surpassing Gemini 3 Pro.
───────────────────────────────────────────────────────────────────
Core Capabilities
⚡ Ultimate Reasoning Engine:
- It supports multi-round iterative self-reflection, avoiding redundant computations and improving token utilization efficiency.
- On the HMMT Feb 25 math competition benchmark, it scored 98.0 points, approaching the top human-level performance.
🔍 Adaptive Tool Invocation:
- It automatically determines when to invoke the search engine, code interpreter, or long-term memory without manual intervention.
- It effectively mitigates hallucinations, providing real-time information and personalized responses, thus enhancing dialogue credibility.
🧩 Agent-Ready Architecture:
- Its Agentic Search (HLE w/ tools) score is 49.8, significantly outperforming competitors.
- It comes with built-in support for the Deep Planning agent benchmark, specially designed for complex task chains.
🔌 Seamless Development Experience:
- The API is compatible with OpenAI and Anthropic protocols, making it plug-and-play across Python, Node.js, and other ecosystems.
───────────────────────────────────────────────────────────────────
Test Data
Playground
Log in to explore more features! Click to Log In