baidu/ernie-4-5-vl-424b-a47b

baidu/ernie-4-5-vl-424b-a47b

Baidu’s flagship multimodal large model—a high-performance visual-language understanding engine that supports both thinking and non-thinking modes.
2025-08-04
LLM
Model capability: function_call
Input:
$0.429/1M tokens
Output:
$1.29/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

ERNIE-4.5-VL-424B-A47B is Baidu’s flagship multimodal large model, built on the Wenxin 4.5 series’ mixture-of-experts (MoE) architecture. It boasts a total parameter count of up to 424 billion, with 47 billion active parameters, and is primarily positioned as a “high-performance visual-language understanding engine that supports both thinking and non-thinking modes.”

  • Leading in Dual-Mode Inference: In thinking mode, it approaches or even surpasses OpenAI-o1 on challenging multimodal reasoning tasks such as MathVista and MMMU; in non-thinking mode, it maintains top-tier performance on perception-based tasks like CV-Bench.
  • Architectural Innovation: It adopts a multimodal heterogeneous MoE structure, achieving synergistic enhancement of text and vision capabilities through cross-modal parameter sharing plus dedicated spatial retention.
  • Excellent Performance and Efficiency: Compared to the Qwen2.5-VL series, the lightweight version (28B-A3B) already demonstrates strong competitiveness, while the flagship version comprehensively leads the multimodal SOTA rankings.
  • Industry-Friendly: Trained and deployed based on PaddlePaddle, it supports lossless quantization at 4-bit and 2-bit levels, is compatible with OpenAI protocols, and comes ready-to-use with FastDeploy deployment.
  • Open Source and Open: The model weights are open-sourced under the Apache 2.0 license, supporting both academic research and commercial applications, accompanied by the ERNIEKit fine-tuning suite.

───────────────────────────────────────────────────────────────────

Core Capabilities  

👁️ Dual-Mode Intelligent Understanding: Uniquely supports switching between “thinking” and “non-thinking” modes, seamlessly balancing complex reasoning and real-time perception.

Efficient MoE Architecture: With 47 billion active parameters, it achieves the full-scale performance of 424 billion parameters, significantly reducing inference costs compared to comparable dense models.

📊 Multimodal SOTA: It comprehensively outperforms competitors on over 10 authoritative benchmarks including MathVista, MMMU, and VisualPuzzle.

🛠️ Industry-Grade Deployment: FastDeploy enables deployment with just one line of code, is compatible with vLLM/OpenAI protocols, and adapts to multiple chip platforms.

🔐 End-to-End Open Source: Under the Apache 2.0 license, the model weights are openly available, and ERNIEKit provides fine-tuning tools such as LoRA, DPO, and quantization.

🇨🇳Optimized for Chinese Scenarios: Deeply adapted for Chinese text-and-image understanding, delivering outstanding performance in localized tasks such as chart interpretation and document question-answering.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

baidu/ernie-4.5-vl-424b-a47b

-
123000

Input$0.429 / 1M tokens
Output$1.29 / 1M tokens

Input$0.429/ 1M tokens
Output$1.29/ 1M tokens
Original Price