kimi-k2-turbo-preview

kimi-k2-turbo-preview

The high-speed version of kimi-k2, with model parameters consistent with kimi-k2
2025-08-01
LLM
Model capability: function_call
Input:
$1.257/1M tokens
Output:
$9.119/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

The Kimi K2 Turbo Preview is a high-speed version of the Kimi K2 series model launched by Moonshot AI. Its core positioning is **“a revolutionary speed upgrade while maintaining powerful reasoning capabilities at the trillion-parameter scale,”** aiming to fundamentally address the longstanding challenge of balancing large-model performance with inference speed.

  • A 4x Speed Revolution: The output speed has significantly increased from 10 tokens per second in the original version to 40 tokens per second, representing a 300% boost. This makes real-time AI interactions, long-text generation, and complex logical reasoning exceptionally smooth.
  • Trillion-Parameter Scale Retained: While achieving extreme acceleration, the model still maintains a total parameter scale of 1T (1 trillion) and 32B active parameters. It adopts a 384-expert mixture-of-experts (MoE) architecture, ensuring that the reasoning depth remains fully consistent with the original Kimi K2 model.
  • Ultra-Long Context Support: Continuing the Kimi family’s strength, it supports a context length of 256K tokens, effortlessly handling long-document analysis, large-scale codebase reviews, and multi-turn complex dialogues.
  • Advanced Reasoning Optimization Techniques: Through dynamic expert routing enhancement (reducing computational overhead), memory-access optimization (improving cache efficiency), and computation-graph simplification, the model achieves a dramatic leap in throughput without sacrificing output quality.
  • Seamless Integration and Replacement: It remains fully compatible with the original Kimi K2’s API, allowing developers to upgrade directly without modifying their code. Additionally, during specific promotional periods, it offers a cost-effective tiered pricing strategy.

───────────────────────────────────────────────────────────────────

Core Capabilities

Real-Time Smooth Conversations: With extremely low first-token latency and lightning-fast subsequent-generation speeds, it’s the ideal brain for online customer service, real-time tech support, and interactive teaching.

📄 Ultra-Fast Content Creation: It can complete the writing, editing, or polishing of 10,000-word documents within seconds, dramatically boosting productivity in creative writing and administrative tasks.

💻 Efficient Development Assistance: For complex code logic, it delivers instant responses, supporting faster code reviews, debugging suggestions, and automatic generation of project-level documentation.

🤖 Intelligent Agent Real-Time Decision-Making: As the core engine for agents, it processes feedback from the environment more swiftly, shortening the closed-loop time for task planning and tool invocation.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Moonshot AI)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

kimi-k2-turbo-preview

-
256000

Input$1.143 / 1M tokens
Output$8.29 / 1M tokens

Input$1.257/ 1M tokens
Output$9.119/ 1M tokens
10%