sophnet/Qwen3-235B-A22B

sophnet/Qwen3-235B-A22B

Alibaba has launched its flagship hybrid expert (MoE) large language model, featuring "235 billion parameters—ultra-large scale" and "22 billion activated parameters—extreme efficiency."
2025-07-08
LLM
Input:
$0.57/1M tokens
Output:
$1.71/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-235B-A22B is Alibaba’s flagship large language model based on the Mixture-of-Experts (MoE) architecture, featuring a “235-billion-parameter ultra-large scale” and an “extremely efficient 22-billion-parameter activation.” It provides enterprise-grade AI solutions for complex tasks through a dual-mode inference architecture.

  • Performance Benchmark: In benchmark tests such as programming (LiveCodeBench 85.7), mathematics (AIME25 93.8), and general capabilities (MMLU-Pro 71.9), Qwen3-235B-A22B outperforms competitors like DeepSeek-R1 and Gemini 2.5 Pro, setting a new standard for open-source models.
  • Dual-Mode Intelligence: It supports both deep-thinking mode (step-by-step reasoning for complex problems) and fast-response mode (instant answers to simple questions). Users can dynamically control the “thinking budget” via the enable_thinking toggle or the /think command.
  • Ultra-Large-Scale Architecture: Leveraging MoE technology, the model boasts a total parameter count of 235 billion, yet only 22 billion parameters are activated during each inference, striking a balance between performance and efficiency. Inference costs are reduced by 70% compared to similar dense models.
  • Ultra-Long Context Support: Natively supporting a 32K-token context, it can be scaled up to 131K tokens via YaRN technology, effortlessly handling ultra-long text tasks.

───────────────────────────────────────────────────────────────────

Core Capabilities

🧠 Dual-Track Inference Engine: Dynamically switches between deep-thinking and fast-response modes, precisely breaking down complex problems and delivering instant feedback for simple queries.

🚀 Performance Leap: Outperforms top-tier competitors in programming, mathematics, and multilingual tasks, setting a new benchmark for enterprise-level AI applications.

Cost-Effective Architecture: MoE technology significantly reduces computational resource consumption; even with a 90% reduction in activated parameters, high performance is maintained.

📏 Ultra-Long Text Processing: Natively supports a 32K-token context, which can be expanded to 131K tokens via YaRN technology.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SophNet)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

sophnet/Qwen3-235B-A22B

-
128000

Input$0.57 / 1M tokens
Output$1.71 / 1M tokens

Input$0.57/ 1M tokens
Output$1.71/ 1M tokens
Original Price