
moonshotai/kimi-k2.5
API Overview
⚠️After special processing by us, when calling this model via 302.AI, you can directly pass images and videos in URL format, greatly improving development efficiency and code readability
Kimi K2.5 is Moonshot AI’s flagship open-source multimodal language model product,and is Kimi’s most intelligent model to date, achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general intelligence tasks. At the same time, Kimi K2.5 is also Kimi’s most versatile model to date, featuring a nativelymultimodal architectural design that supports both visual and text inputs, thinking and non-thinking modes, as well as conversational and Agent tasks. Its core positioning is as a “natively visual intelligence agent engine,” specifically designed for autonomous collaboration in coding, office work, and complex tasks.
- Performance breakthrough: It delivers strong results on three major agent benchmarks—HLE, BrowseComp, and SWE-Verified—and comes at a significantly lower cost than competitors.
- Architectural innovation: It’s the world’s first open-source model to support self-organizing “agent swarms,” dynamically scheduling up to 100 sub-agents and concurrently executing up to 1,500 tool calls.
- Coding revolution: It’s the strongest open-source multimodal coding model, enabling direct generation of front-end interfaces with interactive animations from conversations or videos, realizing “what you see is what you get” visual programming.
- Office productivity boost: It handles end-to-end processing of 10,000-word documents, 100-page reports, financial models, and LaTeX formulas, reducing complex office tasks from hours to minutes.
- Efficiency leap: Compared to single-agent mode, swarm mode reduces execution time by up to 80%, compressing critical path steps by 3–4.5 times, truly achieving “second-level response”.
───────────────────────────────────────────────────────────────────
Core Capabilities
👁️ Visual-driven coding:
- It undergoes native joint training on 15T visual-text tokens, simultaneously enhancing image/video understanding and code generation capabilities.
- It supports visual debugging: input an artwork (such as Matisse’s “Dance”), and it automatically generates a visually consistent webpage and iteratively optimizes it.
- It can reverse-engineer entire websites from videos, dramatically lowering the barrier to front-end development.
🐝 Agent swarm architecture:
- It requires no pre-defined workflows; it automatically decomposes tasks and creates specialized sub-agents such as AI researchers and fact-checkers.
- Trained based on PARL (Parallel Agent Reinforcement Learning), it avoids “serial collapse” and ensures high-concurrency execution.
- It uses “critical step” metrics to optimize latency—more sub-tasks mean shorter overall execution time.
💼 Professional office agents:
- It comes with a built-in K2.5 Agent mode, directly outputting professional documents such as Word annotations, Excel pivot tables, and PDF formulas.
- In the AI Office Benchmark, it outperforms K2 Thinking by 59.3%, delivering remarkable results in real-world office scenarios.
- It supports ultra-long outputs like 10,000-word papers or 100-page reports while maintaining structural and logical consistency.
───────────────────────────────────────────────────────────────────
Effect Demonstrations
- K2.5 can transform simple conversations into complete front-end interfaces, realizing interactive layouts and rich animation effects, such as scroll-triggered effects
- K2.5 also excels in visual coding. By reasoning over images and videos, K2.5 improves the generation and visual debugging of code from images/videos, lowering the barrier for users to express their intentions visually
Playground
Log in to explore more features! Click to Log In