deepseek-v4-pro

deepseek-v4-pro

The latest flagship AI model released by the DeepSeek series represents the current highest standard in both scale and performance among open-source models.
2026-04-24
LLM
Model capability: thinkingModel capability: function_call
Input:
$1.72/1M tokens
Output:
$3.43/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

 DeepSeek-V4-Pro is DeepSeek’s flagship inference large model, marking the official entry of open-source models into the era of million-token context windows. As a groundbreaking achievement in DeepSeek’s transition from a “strongest inference” model to an “all-around foundational large model,” V4-Pro leverages an innovative hybrid attention mechanism and underlying structural optimizations to significantly reduce computational costs for long-text inference while maintaining a knowledge base at the trillion-parameter level. It is not only an efficiency tool for programming and engineering but also the preferred core engine for enterprises handling massive document analysis, complex agent orchestration, and multi-step reasoning tasks.

───────────────────────────────────────────────────────────────────

Core Capabilities

Million-level Context: Equipped with a pioneering token-dimensional compression mechanism and DSA (DeepSeek Sparse Attention) sparse attention technology, it achieves a standard ultra-long context window of 1M tokens. The model can process dozens of lengthy novels or an entire medium-sized project’s codebase in one go, completely overcoming the computational and memory bottlenecks faced by traditional models when dealing with long sequences, with inference FLOPs reduced to just 27% of the previous generation.

Benchmark in Agentic Coding for the Open-Source Domain: Specifically optimized for agent scenarios, it excels in code generation, cross-file bug diagnosis, and engineering tasks. Its delivery quality has reached the best level in the open-source community.

Deep Inference and Authoritative Verification Engine: Integrating the Engram memory architecture and dual-modal reasoning support, it not only supports non-thinking modes but also boasts a powerful “thinking mode.” Through the reasoning_effort parameter, it can deeply enhance complex logical reasoning. This model has already matched the performance of top global closed-source models in STEM, mathematics, and competitive programming evaluations, ensuring that its outputs feature high information density and rigorous logic.

Ultimate Production-Level Energy Efficiency: Thanks to the new Hybrid Attention architecture, KV Cache usage during long-task processing is reduced to as low as 10% of the previous generation. This significant performance boost enables enterprises to deploy and handle high-level tasks—such as ultra-long contract reviews, cross-disciplinary research report summaries, and complex workflow scheduling—at a more cost-effective computing expense, dramatically accelerating commercial deployment speed.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (3)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (DeepSeek)
POST
Stable
View Details
Chat (DeepSeek)
POST
Stable
View Details
Messages (for Claude Code)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

deepseek-chat

-
1000000

Input$1.72 / 1M tokens
Output$3.43 / 1M tokens

Input$1.72/ 1M tokens
Output$3.43/ 1M tokens
Original Price