
deepseek/deepseek-v3/community
API Overview
DeepSeek-V3 is a flagship open-source language model launched by the DeepSeek team. Its core positioning is to achieve high performance while significantly reducing training costs through innovative architectures and training technologies.
- Excellent performance: It performs outstandingly in tests such as MMLU and GPQA. It surpasses some closed-source models in code and mathematics tasks, and excels in Chinese factual knowledge tasks.
- Reduced costs: The training cost is compressed to 2.788 million H800 GPU hours, which is less than 1/3 of traditional solutions.
- Improved speed: The inference speed is more than twice that of the previous generation.
- Long text support: 128K context length supports long text processing.
───────────────────────────────────────────────────────────────────
Core capabilities
⚙️ Efficient architecture: Multi-head latent attention reduces inference memory usage, and the DeepSeekMoE architecture achieves load balancing.
🚀 Multi-Token prediction: Allows the model to predict multiple future tokens at each position, accelerating inference speed by 1.8 times.
💪 FP8 training: The feasibility is verified for the first time in ultra-large-scale models, reducing memory usage with little performance loss.
⚡ Parallel framework: Bidirectional pipeline scheduling reduces communication overhead, and the training efficiency is close to the theoretical upper limit.
Playground
Log in to explore more features! Click to Log In