step-3.5-flash

step-3.5-flash

The flagship language reasoning model of Stepfun
2026-02-02
LLM
Model capability: thinkingModel capability: function_call
Input:
$0.11/1M tokens
Output:
$0.33/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

The flagship language reasoning model from Jueyue Xingchen boasts top-tier reasoning capabilities and fast, reliable execution. It can decompose and plan complex tasks, quickly and reliably invoke tools to carry out tasks, and excel at a wide range of challenging assignments—including logical reasoning, mathematics, software engineering, and deep research. With a context length of 256K, its core design is a “light-activation, high-density, efficient agent engine” specifically tailored for real-time interaction, complex reasoning, and coding tasks.

  • Sparse and Efficient Architecture: Based on an MoE model with 196B total parameters, this model activates only 11B parameters per token, striking the ultimate balance between large-model capabilities and small-model speed.
  • Ultra-High-Speed Generation Engine: Featuring 3-path multi-token prediction (MTP-3), it achieves speeds of 100–300 tokens/s in typical scenarios and peaks at 350 tokens/s in single-stream encoding, delivering near-real-time responses.
  • Leading Agent Performance: It excels in authoritative benchmarks such as SWE-bench Verified (74.4%) and Terminal-Bench 2.0 (51.0%), demonstrating exceptional stability and reliability.
  • Long-Context Optimization: Supporting a 256K context length, it employs a 3:1 sliding-window attention (SWA) mechanism, significantly reducing computational overhead without sacrificing performance.
  • Locally Friendly Deployment: It can run on consumer-grade high-end hardware such as Mac Studio M4 Max and NVIDIA DGX Spark, ensuring data privacy and low latency.

───────────────────────────────────────────────────────────────────

Core Capabilities

Ultra-Fast Reasoning Engine:

  • The MTP-3 technology performs forward prediction on four tokens at once, dramatically accelerating generation and making encoding tasks lightning-fast.
  • With just 11B activated parameters, its inference cost is about 1/6 to 1/18 that of comparable MoE models, offering outstanding cost-effectiveness.

🧰 Professional Agent Foundation:

  • It integrates a scalable RL framework that supports continuous self-improvement and excels at long-term, multi-step tasks.
  • It leads comprehensively in Chinese and English reasoning benchmarks including BrowseComp-ZH (73.7%), GAIA (84.5%), and AIME 2025 (97.3%).

🧠 High-Density Intelligence:

  • A fine-grained routing system featuring 288 expert specialists plus 1 shared expert retains the “memory” of the 196B model while achieving execution efficiency comparable to an 11B model.
  • It supports the Parallel Thinking mechanism, further enhancing performance on tasks like xbench-DeepSearch.

───────────────────────────────────────────────────────────────────

Test Data

step-bar-chart.png

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Stepfun Multimodal)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

step-3.5-flash

-
256000

Input$0.1 / 1M tokens
Output$0.3 / 1M tokens

Input$0.11/ 1M tokens
Output$0.33/ 1M tokens
10%