qwen/qwen3-30b-a3b-fp8

qwen/qwen3-30b-a3b-fp8

Non-thinking mode FP8 quantized version of Qwen3 - 30B - A3B model
2025-06-10
LLM
Input:
$0.1/1M tokens
Output:
$0.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-30B-A3B-Instruct-2507-FP8 is the FP8 quantized version of the high-efficiency Mixture of Experts (MoE) language model launched by Alibaba Tongyi Lab, with its core positioning as an enterprise-level intelligent inference engine featuring "small activation, large capabilities, and low latency".

  • Exquisite Design of MoE Architecture: With approximately 30B total parameters and only 3B (A3B) activation parameters, it significantly reduces computational overhead while maintaining high intelligent density.
  • Acceleration by FP8 Quantization: Using FP8 precision, it achieves more than 2x improvement in inference throughput on hardware such as NVIDIA H100/A100, with significantly reduced memory footprint.
  • 128K Ultra-Long Context: Natively supports long text input, suitable for scenarios such as document summarization, multi-turn dialogue, and complex task planning.
  • Optimized Instruction Tuning: Specifically trained for high-quality instruction following, it outputs accurately and reliably in logical reasoning, code generation, and multilingual Q&A.

───────────────────────────────────────────────────────────────────

Core Capabilities

Extreme Energy Efficiency Ratio: Achieves 30B-level comprehensive capabilities with a computational cost close to that of a 7B dense model, resulting in higher output per unit of computing power.

🧠 Precise Task Execution: After fine-grained alignment training, it can accurately understand fine-grained instruction requirements such as format, style, and multi-step tasks.

🌍 Multilingual Natural Expression: Covers mainstream languages such as Chinese, English, Japanese, and French, with outputs conforming to local cultural contexts and professional practices.

🛡️ Production Environment Ready: FP8 quantization + MoE architecture, balancing performance, cost, and stability, suitable for deployment in high-concurrency enterprise-level applications.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen/qwen3-30b-a3b-fp8

-
128000

Input$0.1 / 1M tokens
Output$0.5 / 1M tokens

Input$0.1/ 1M tokens
Output$0.5/ 1M tokens
Original Price