
sophnet/Qwen3-235B-A22B
API Overview
Qwen3-235B-A22B is Alibaba’s flagship large language model based on the Mixture-of-Experts (MoE) architecture, featuring a “235-billion-parameter ultra-large scale” and an “extremely efficient 22-billion-parameter activation.” It provides enterprise-grade AI solutions for complex tasks through a dual-mode inference architecture.
- Performance Benchmark: In benchmark tests such as programming (LiveCodeBench 85.7), mathematics (AIME25 93.8), and general capabilities (MMLU-Pro 71.9), Qwen3-235B-A22B outperforms competitors like DeepSeek-R1 and Gemini 2.5 Pro, setting a new standard for open-source models.
- Dual-Mode Intelligence: It supports both deep-thinking mode (step-by-step reasoning for complex problems) and fast-response mode (instant answers to simple questions). Users can dynamically control the “thinking budget” via the
enable_thinkingtoggle or the/thinkcommand. - Ultra-Large-Scale Architecture: Leveraging MoE technology, the model boasts a total parameter count of 235 billion, yet only 22 billion parameters are activated during each inference, striking a balance between performance and efficiency. Inference costs are reduced by 70% compared to similar dense models.
- Ultra-Long Context Support: Natively supporting a 32K-token context, it can be scaled up to 131K tokens via YaRN technology, effortlessly handling ultra-long text tasks.
───────────────────────────────────────────────────────────────────
Core Capabilities
🧠 Dual-Track Inference Engine: Dynamically switches between deep-thinking and fast-response modes, precisely breaking down complex problems and delivering instant feedback for simple queries.
🚀 Performance Leap: Outperforms top-tier competitors in programming, mathematics, and multilingual tasks, setting a new benchmark for enterprise-level AI applications.
⚡ Cost-Effective Architecture: MoE technology significantly reduces computational resource consumption; even with a 90% reduction in activated parameters, high performance is maintained.
📏 Ultra-Long Text Processing: Natively supports a 32K-token context, which can be expanded to 131K tokens via YaRN technology.
Playground
Log in to explore more features! Click to Log In