kimi-k2.5

kimi-k2.5

Kimi is currently the most versatile and intelligent multimodal model, excelling in agent capabilities, code, and visual understanding.
2026-01-27
LLM
Model capability: imageModel capability: videoModel capability: thinkingModel capability: function_call
Input:
$0.627/1M tokens
Output:
$3.3/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

⚠️ After our special processing, when 302.AI calls this model, it can directly pass in images and videos in URL format , which greatly improves development efficiency and code readability.


Kimi K2.5 is Moonshot AI’s flagship open-source multimodal language model product,and is Kimi’s most intelligent model to date, achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general-purpose intelligence tasks. At the same time, Kimi K2.5 is also Kimi’s most versatile model, featuring a nativelymultimodal architectural design that supports both visual and text inputs, thinking and non-thinking modes, as well as conversational and Agent tasks. Its core positioning is as a “natively visual intelligence agent engine,” specifically designed for autonomous collaboration in coding, office work, and complex tasks.

  • Performance breakthrough: It delivers strong results on three major agent benchmarks—HLE, BrowseComp, and SWE-Verified—and its cost is only a fraction of that of competing products.
  • Architectural innovation: It’s the world’s first open-source model that supports self-organizing “agent swarms,” dynamically scheduling up to 100 sub-agents and concurrently executing up to 1,500 tool calls.
  • Coding revolution: It’s the strongest open-source multimodal coding model, enabling direct generation of front-end interfaces with interactive animations from conversations or videos, realizing “what you see is what you get” visual programming.
  • Office efficiency boost: It processes end-to-end documents of up to 10,000 words, reports of 100 pages, financial models, and LaTeX formulas, reducing complex office tasks from hours to minutes.
  • Efficiency leap: Compared to single-agent mode, swarm mode reduces execution time by up to 80%, compressing critical path steps by 3–4.5 times, truly achieving “second-level response”.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Visual-driven coding:

  • Natively jointly trained on 15T visual-text tokens, simultaneously enhancing image/video understanding and code generation capabilities.
  • Supports visual debugging: Input an artwork (such as Matisse’s “Dance”), and automatically generate a visually consistent webpage and iteratively optimize it.
  • Can reverse-engineer entire websites from videos, significantly lowering the barrier to front-end development.

🐝 Agent swarm architecture:

  • No need for pre-defined workflows—tasks are automatically decomposed and specialized sub-agents such as AI researchers and fact-checkers are created.
  • Trained based on PARL (Parallel Agent Reinforcement Learning), avoiding “serial collapse” and ensuring high concurrency.
  • Uses “critical step” metrics to optimize latency; the more sub-tasks there are, the shorter the overall execution time.

💼 Professional office agents:

  • Built-in K2.5 Agent mode directly outputs professional documents such as Word annotations, Excel pivot tables, and PDF formulas.
  • Outperforms K2 Thinking by 59.3% on the AI Office Benchmark, delivering remarkable results in real-world office scenarios.
  • Supports ultra-long outputs like 10,000-word papers or 100-page reports while maintaining structural and logical consistency.

───────────────────────────────────────────────────────────────────

Demonstration of Results

  • K2.5 can transform simple conversations into complete front-end interfaces, realizing interactive layouts and rich animation effects, such as scroll-triggered effects.


  • K2.5 also excels in visual coding. By reasoning over images and videos, K2.5 improves the generation and visual debugging of code from images/videos, lowering the barrier for users to express their intentions visually.


Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(Moonshot kimi AI-Vision)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

kimi-k2.5

-
256000

Input$0.57 / 1M tokens
Output$3 / 1M tokens

Input$0.627/ 1M tokens
Output$3.3/ 1M tokens
10%