qwen3-coder-flash-2025-07-28

qwen3-coder-flash-2025-07-28

A code generation model based on Qwen3, inheriting the coding agent capabilities of Qwen3-Coder-Plus.
2025-07-28
LLM
Model capability: function_call
Input:
$0.143/1M tokensstarting from
Output:
$0.58/1M tokensstarting from
Bulk order? Contact your manager for exclusive deals

API Overview

qwen3-coder-flash is a lightweight MoE-architected code model released by Alibaba’s Tongyi Qwen3-Coder series. Its core positioning is as a “low-barrier, high-performance programming assistant,” achieving powerful agent-like coding capabilities through efficient parameter configuration, making it well-suited for small-to-medium-scale development and localized deployment scenarios.

  • Efficient MoE Architecture Configuration: Total parameters: 30.5B (with 3.3B activated); 128 expert layers (8 experts activated per token); GQA attention mechanism (32 Q heads and 4 KV heads), balancing performance and computational resource consumption.
  • Ultra-long Context Support: Native context length of 262,144 tokens, expandable up to 1 million tokens via YaRN, ideal for repository-level code understanding and long-text programming tasks.
  • Cross-Scenario Coding Capabilities: Outstanding performance in tasks such as agentic coding, browser use, and tool use; supports multi-language programming, code generation, debugging, and tool invocation.
  • Deployment-Friendly Features: Supports single-card H100/A100 operation (requiring 80 GB VRAM); provides an FP8 quantized version (reducing memory usage by 70%); compatible with frameworks including transformers, vLLM, and llama.cpp.
  • Tool Ecosystem Compatibility: Adapts to programming tools such as Qwen Code, Cline, and Claude Code; supports OpenAI SDK calls and Alibaba Cloud DashScope API; offers customizable function call formats.

───────────────────────────────────────────────────────────────────

Core Capabilities

💻 Professional Code Generation: Supports multi-language code writing (e.g., quicksort algorithms), full-stack development, and code fixing; generates runnable code with a Pass@1 rate close to that of large-parameter models.

🤖 Agent-Based Programming: Independently plans multi-step development tasks, invokes command-line tools and browser utilities, and handles complex workflows such as cross-file refactoring and CI feedback-based debugging.

📚 Long-Text Code Understanding: Parses codebases up to 1 million tokens, precisely identifies cross-file dependencies, and is well-suited for large-scale project development and maintenance.

🔧 Flexible Tool Invocation: Supports custom tool functions (e.g., numerical computation), is compatible with mainstream programming toolchains, and can be integrated into IDEs and development workflows.

🌍 Multi-Language Adaptation: Natively supports multi-language programming, excels in Chinese language processing, and is well-suited for cross-border development and multilingual project requirements.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Tongyi Qianwen)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen3-coder-flash-2025-07-28

0<Token≤32K
1000000

Input$0.143 / 1M tokens
Output$0.58 / 1M tokens

Input$0.143/ 1M tokens
Output$0.58/ 1M tokens
Original Price

qwen3-coder-flash-2025-07-28

32K<Token≤128K
1000000

Input$0.22 / 1M tokens
Output$0.86 / 1M tokens

Input$0.22/ 1M tokens
Output$0.86/ 1M tokens
Original Price

qwen3-coder-flash-2025-07-28

128K<Token≤256K
1000000

Input$0.36 / 1M tokens
Output$1.43 / 1M tokens

Input$0.36/ 1M tokens
Output$1.43/ 1M tokens
Original Price

qwen3-coder-flash-2025-07-28

256K<Token≤1M
1000000

Input$0.72 / 1M tokens
Output$3.58 / 1M tokens

Input$0.72/ 1M tokens
Output$3.58/ 1M tokens
Original Price