
qwq-32b-preview
API Overview
QwQ-32B is a reinforcement-learning-driven reasoning model launched by Alibaba, featuring “32 billion parameters rivaling 671 billion-parameter models” and “integration of critical thinking and tool invocation capabilities,” providing a cost-effective enterprise-level solution for complex reasoning tasks.
- Performance on par with top-tier models: In benchmark tests such as programming (LiveCodeBench 83.9), mathematics (AIME24 79.8), and general abilities (MMLU-Pro 71.6), QwQ-32B matches DeepSeek-R1 and outperforms competitors like o1-mini.
- Breakthrough in reinforcement learning: Through cold-start data and multi-stage training, combined with answer correctness verification and code execution feedback, QwQ-32B achieves continuous improvement in mathematical and programming capabilities.
- Two-mode reasoning: Supports both critical thinking (breaking down complex problems into steps) and tool invocation (adjusting based on environmental feedback), dynamically balancing deep reasoning and real-time responsiveness.
- Open-source and open: Released under the Apache 2.0 license on Hugging Face and ModelScope, offering both API access and local deployment options.
- Enterprise-grade compatibility: Supports deployment on consumer-grade GPUs (such as RTX 3090), reducing inference costs by 70% compared to hundred-billion-parameter models.
───────────────────────────────────────────────────────────────────
Core Capabilities
🧠 Reinforcement Learning Engine: Based on answer verification and code execution feedback, QwQ-32B continuously evolves its mathematical and programming capabilities, breaking through traditional training bottlenecks.
🚀 Two-track reasoning mode: Dynamically switches between critical thinking (breaking down complex problems into steps) and tool invocation (adjusting based on environmental feedback), balancing depth and efficiency.
⚡ Ultra-high cost-effectiveness: With 32 billion parameters, QwQ-32B delivers “small size, big power,” enabling smooth operation on consumer-grade devices and lowering the barrier to entry for enterprise AI applications by 60%.
🌐 Full-scenario coverage: Matches top-tier competitors in tasks such as programming, mathematics, and general question answering, suitable for high-frequency interaction scenarios like e-commerce customer service and financial risk control.
───────────────────────────────────────────────────────────────────
Benchmark Tests

Playground
Log in to explore more features! Click to Log In