baidu/ernie-4.5-21B-a3b

baidu/ernie-4.5-21B-a3b

Baidu’s high-performance Mixture-of-Experts (MoE) large language model is primarily positioned as a lightweight flagship text model characterized by “small activation, great capability.”
2025-08-04
LLM
Model capability: function_call
Input:
$0.0715/1M tokens
Output:
$0.286/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

ERNIE-4.5-21B-A3B is a high-performance Mixture-of-Experts (MoE) large language model released by Baidu. It has a total of 21 billion parameters, with only 3 billion activated parameters. Its core design philosophy is “small activation, great capability”—a lightweight flagship text model that strikes a balance between high-performance inference and low computational costs.

  • Efficient Parameter Design: Among the 21 billion total parameters, only 3 billion are activated per token, significantly reducing inference resource consumption and offering superior cost-effectiveness compared to dense 30-billion-parameter models.
  • Long Context Support: The maximum context length reaches 131,072 tokens, easily handling long documents, complex dialogues, and other such scenarios.
  • Multi-modal Architecture Reuse: Although it’s a pure text model, it reuses the Wenxin 4.5 multi-modal MoE architecture, featuring 64 text experts (with 6 activated) plus 2 shared experts.
  • Superior Performance Compared to Competitors: On inference and mathematical benchmarks such as BBH and CMATH, it outperforms Qwen3-30B-A3B, achieving “smaller yet stronger” results.
  • Full Ecosystem Compatibility: It provides both PyTorch and PaddlePaddle formats, supports vLLM and OpenAI protocols, enables one-line deployment via FastDeploy, and is open-sourced under the Apache 2.0 license for commercial use.

───────────────────────────────────────────────────────────────────

Core Capabilities

Efficient MoE Inference: Achieves 21-billion-parameter model performance with just 3 billion activated parameters, significantly lowering inference costs compared to comparable dense models.

📚 131K Ultra-long Context: Supports ultra-long-text understanding and generation, making it ideal for complex scenarios such as legal, scientific research, and customer service.

🧠 Expert Collaboration Architecture: Featuring 64 text experts plus a dynamic routing mechanism, it precisely matches task requirements and enhances generation quality.

🏆 SOTA Inference Capability: Outperforms larger-parameter competitors in tasks such as mathematics, logic, and knowledge-based question answering.

🛠️ Ready-to-use Ecosystem: Compatible with transformers/vLLM; FastDeploy supports OpenAI API, enabling rapid integration into existing systems.

🔓 Commercially Friendly Open Source: Released under the Apache 2.0 license, supporting fine-tuning (SFT/DPO/UPO), quantization, and private deployment.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

baidu/ernie-4.5-21B-a3b

-
120000

Input$0.0715 / 1M tokens
Output$0.286 / 1M tokens

Input$0.0715/ 1M tokens
Output$0.286/ 1M tokens
Original Price