
step-3.5-flash
API Overview
The flagship language reasoning model from Jueyue Xingchen boasts top-tier reasoning capabilities and fast, reliable execution. It can decompose and plan complex tasks, quickly and reliably invoke tools to carry out tasks, and excel at a wide range of challenging assignments—including logical reasoning, mathematics, software engineering, and deep research. With a context length of 256K, its core design is a “light-activation, high-density, efficient agent engine” specifically tailored for real-time interaction, complex reasoning, and coding tasks.
- Sparse and Efficient Architecture: Based on an MoE model with 196B total parameters, this model activates only 11B parameters per token, striking the ultimate balance between large-model capabilities and small-model speed.
- Ultra-High-Speed Generation Engine: Featuring 3-path multi-token prediction (MTP-3), it achieves speeds of 100–300 tokens/s in typical scenarios and peaks at 350 tokens/s in single-stream encoding, delivering near-real-time responses.
- Leading Agent Performance: It excels in authoritative benchmarks such as SWE-bench Verified (74.4%) and Terminal-Bench 2.0 (51.0%), demonstrating exceptional stability and reliability.
- Long-Context Optimization: Supporting a 256K context length, it employs a 3:1 sliding-window attention (SWA) mechanism, significantly reducing computational overhead without sacrificing performance.
- Locally Friendly Deployment: It can run on consumer-grade high-end hardware such as Mac Studio M4 Max and NVIDIA DGX Spark, ensuring data privacy and low latency.
───────────────────────────────────────────────────────────────────
Core Capabilities
⚡ Ultra-Fast Reasoning Engine:
- The MTP-3 technology performs forward prediction on four tokens at once, dramatically accelerating generation and making encoding tasks lightning-fast.
- With just 11B activated parameters, its inference cost is about 1/6 to 1/18 that of comparable MoE models, offering outstanding cost-effectiveness.
🧰 Professional Agent Foundation:
- It integrates a scalable RL framework that supports continuous self-improvement and excels at long-term, multi-step tasks.
- It leads comprehensively in Chinese and English reasoning benchmarks including BrowseComp-ZH (73.7%), GAIA (84.5%), and AIME 2025 (97.3%).
🧠 High-Density Intelligence:
- A fine-grained routing system featuring 288 expert specialists plus 1 shared expert retains the “memory” of the 196B model while achieving execution efficiency comparable to an 11B model.
- It supports the Parallel Thinking mechanism, further enhancing performance on tasks like xbench-DeepSearch.
───────────────────────────────────────────────────────────────────
Test Data

Playground
Log in to explore more features! Click to Log In