qwen/qwen3-235b-a22b-fp8

qwen/qwen3-235b-a22b-fp8

Flagship-level Mixture of Experts (MoE) large model optimized based on FP8 quantization technology
2025-06-10
LLM
Input:
$0.2/1M tokens
Output:
$0.8/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-235B-A22B-FP8 is the FP8-quantized version of the ultra-large-scale Mixture-of-Experts (MoE) language model released by Alibaba’s Tongyi Lab, primarily positioned as a high-performance enterprise-grade foundation model that delivers “extreme inference efficiency combined with top-tier general capabilities.”

  • Flagship MoE Architecture: With a total parameter count of 235 billion and only 22 billion active parameters, it achieves state-of-the-art (SOTA) performance in authoritative benchmarks such as MMLU, GSM8K, and HumanEval.
  • FP8 Quantization Acceleration: Adopting FP8 precision for both storage and computation, it achieves 2–3 times higher inference throughput on hardware platforms like NVIDIA H100 and A100, significantly reducing latency and costs.
  • Long Context Support: Natively supports context lengths of up to 128,000 tokens, making it ideal for long-document summarization, complex task decomposition, and multi-turn deep dialogues.
  • Multi-language and Code Enhancement: Covers dozens of languages including Chinese, English, Japanese, and French, and excels in specialized tasks such as code generation and mathematical reasoning.

───────────────────────────────────────────────────────────────────

Core Capabilities

High Throughput and Low Latency Inference: FP8 quantization dramatically reduces memory usage and computational overhead, enabling a single GPU to support highly concurrent enterprise-level applications.

🧠 Strong Logical Reasoning and Generalization Abilities: Maintains high accuracy and stability in scenarios such as complex instruction following, multi-hop question answering, and tool invocation.

🌍 Global Language Support: Delivers natural and fluent outputs while taking into account cultural contexts and specialized terminology, making it suitable for international business and localized scenarios.

🛡️ Secure, Controllable, and Auditable: Supports content filtering, sensitive word interception, and logging of inference processes, meeting compliance requirements in sectors such as finance and government services.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen/qwen3-235b-a22b-fp8

-
40960

Input$0.2 / 1M tokens
Output$0.8 / 1M tokens

Input$0.2/ 1M tokens
Output$0.8/ 1M tokens
Original Price