qwen/qwen3-32b-fp8

qwen/qwen3-32b-fp8

FP8 Quantized Version of the High-Performance Dense Language Model Qwen3 - 32B
2025-06-10
LLM
Input:
$0.1/1M tokens
Output:
$0.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-32B-FP8 is the FP8 quantized version of a high-performance dense language model released by Alibaba’s Tongyi Lab. Its core positioning is as a flagship enterprise-grade inference model that delivers “high cost-effectiveness, low latency, and strong general-purpose capabilities.”

  • Efficient and Stable Dense Architecture: With full 32B parameter activation, it delivers balanced and reliable performance across tasks such as code generation, mathematical reasoning, and multilingual processing, without the uncertainty introduced by MoE routing.
  • FP8 Quantization Acceleration: Leveraging FP8 precision optimization, it achieves a more than 2x increase in inference speed on NVIDIA H100/A100 GPUs, significantly reducing deployment costs.
  • Ultra-long Context of 128K Tokens: Natively supports long-text inputs, making it ideal for scenarios such as technical document parsing, legal contract review, and multi-turn complex dialogues.
  • Deep Multilingual Coverage: It excels at understanding Chinese contexts while also delivering high-quality generation in dozens of languages including English, Japanese, French, and Spanish.

───────────────────────────────────────────────────────────────────

Core Capabilities

High Throughput and Real-Time Response: FP8 quantization dramatically reduces memory usage and computational overhead, enabling a single GPU to support highly concurrent API services.

🧠 Stable and Strong Inference Capability: It performs exceptionally well in benchmarks such as HumanEval, GSM8K, and C-Eval, making it suitable for production environments with high requirements for determinism.

🌍 Expert-Level Bilingual Output in Chinese and English: Whether drafting technical proposals, marketing copy, or academic abstracts, it maintains natural-sounding language and rigorous logic.

🛡️ Enterprise Security and Compliance: Supports private deployment, content filtering, and audit logs, meeting regulatory requirements in industries such as finance, government, and healthcare.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen/qwen3-32b-fp8

-
128000

Input$0.1 / 1M tokens
Output$0.5 / 1M tokens

Input$0.1/ 1M tokens
Output$0.5/ 1M tokens
Original Price