
baidu/ernie-4.5-21B-a3b
API Overview
ERNIE-4.5-21B-A3B is a high-performance Mixture-of-Experts (MoE) large language model released by Baidu. It has a total of 21 billion parameters, with only 3 billion activated parameters. Its core design philosophy is “small activation, great capability”—a lightweight flagship text model that strikes a balance between high-performance inference and low computational costs.
- Efficient Parameter Design: Among the 21 billion total parameters, only 3 billion are activated per token, significantly reducing inference resource consumption and offering superior cost-effectiveness compared to dense 30-billion-parameter models.
- Long Context Support: The maximum context length reaches 131,072 tokens, easily handling long documents, complex dialogues, and other such scenarios.
- Multi-modal Architecture Reuse: Although it’s a pure text model, it reuses the Wenxin 4.5 multi-modal MoE architecture, featuring 64 text experts (with 6 activated) plus 2 shared experts.
- Superior Performance Compared to Competitors: On inference and mathematical benchmarks such as BBH and CMATH, it outperforms Qwen3-30B-A3B, achieving “smaller yet stronger” results.
- Full Ecosystem Compatibility: It provides both PyTorch and PaddlePaddle formats, supports vLLM and OpenAI protocols, enables one-line deployment via FastDeploy, and is open-sourced under the Apache 2.0 license for commercial use.
───────────────────────────────────────────────────────────────────
Core Capabilities
⚡ Efficient MoE Inference: Achieves 21-billion-parameter model performance with just 3 billion activated parameters, significantly lowering inference costs compared to comparable dense models.
📚 131K Ultra-long Context: Supports ultra-long-text understanding and generation, making it ideal for complex scenarios such as legal, scientific research, and customer service.
🧠 Expert Collaboration Architecture: Featuring 64 text experts plus a dynamic routing mechanism, it precisely matches task requirements and enhances generation quality.
🏆 SOTA Inference Capability: Outperforms larger-parameter competitors in tasks such as mathematics, logic, and knowledge-based question answering.
🛠️ Ready-to-use Ecosystem: Compatible with transformers/vLLM; FastDeploy supports OpenAI API, enabling rapid integration into existing systems.
🔓 Commercially Friendly Open Source: Released under the Apache 2.0 license, supporting fine-tuning (SFT/DPO/UPO), quantization, and private deployment.
Playground
Log in to explore more features! Click to Log In