sophnet/LongCat-Flash-Chat

sophnet/LongCat-Flash-Chat

The efficient Mixture-of-Experts (MoE) language model released and open-sourced by Meituan's technical team.
2025-09-01
LLM
Input:
$0.143/1M tokens
Output:
$0.714/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

LongCat-Flash-Chat is an efficient Mixture-of-Experts (MoE) language model officially released and open-sourced by Meituan’s tech team. Its core positioning is as a next-generation AI foundational model characterized by “dynamic computation, ultra-fast inference, and agent-first architecture,” specifically designed for complex, long-duration, and tool-intensive tasks.

  • Ultra-large-scale MoE architecture: With a total parameter count reaching 560B, each token activates only between 18.6B and 31.3B parameters (averaging around 27B). This is achieved through an innovative “Zero-Computation Experts” mechanism that enables on-demand allocation of computational resources.
  • Industry-leading inference efficiency: On NVIDIA H800, it achieves a generation speed of over 100 tokens/s, significantly faster than mainstream models of similar or even smaller scale.
  • Comprehensively superior agent capabilities: It excels in the τ2-Bench (tool usage) and VitaBench (complex-scenario agents), making it particularly well-suited for multi-step agent applications that demand longer processing times.

───────────────────────────────────────────────────────────────────

Core Capabilities

🧠 Strong instruction-following ability: Ranked first in IFEval (89.65), and also achieved top scores in the Chinese instruction benchmarks COLLIE and Meeseeks-zh.

📚 Rigorous general knowledge: Scored 89.71 on MMLU, 90.44 on CEval, and 86.50 on ArenaHard-V2—its overall performance rivals that of China’s top models.

🧩 Natively trained for agent-based tasks: Built its own agent-specific evaluation dataset and adopted a multi-agent approach to generate high-quality trajectory data, optimizing the entire process from tool calls to environmental interactions.

System-level engineering optimization: Introduced technologies such as cross-layer communication parallelism and customized low-level operators, enabling efficient training within just 30 days and achieving extremely low inference latency.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SophNet)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

sophnet/LongCat-Flash-Chat

-
128000

Input$0.143 / 1M tokens
Output$0.714 / 1M tokens

Input$0.143/ 1M tokens
Output$0.714/ 1M tokens
Original Price