inclusionAI/Ling-flash-2.0

inclusionAI/Ling-flash-2.0

A large language model launched by Alibaba, built on a Mixture of Experts (MoE) architecture.
2025-09-17
LLM
Model capability: thinkingModel capability: function_call
Input:
$0.143/1M tokens
Output:
$0.572/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Ling-flash-2.0 is the third model in the Ling 2.0 architecture series, released by Ant Group’s Bailing team. It is a Mixture-of-Experts (MoE) model with a total parameter size of 100 billion, yet each token activates only 6.1 billion parameters—specifically, 4.8 billion non-word-vector activations. As a lightly configured model, Ling-flash-2.0 has demonstrated performance in multiple authoritative benchmarks that rivals or even surpasses that of 40-billion-parameter Dense models and larger-scale MoE models. Designed to explore a high-efficiency path under the prevailing consensus that "larger models equate to more parameters," this model leverages cutting-edge architectural design and advanced training strategies.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SiliconFlow)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

inclusionAI/Ling-flash-2.0

-
128000

Input$0.143 / 1M tokens
Output$0.572 / 1M tokens

Input$0.143/ 1M tokens
Output$0.572/ 1M tokens
Original Price