qwen/qwen3-4b-fp8

qwen/qwen3-4b-fp8

Alibaba has launched a lightweight model with 4 billion parameters, specifically optimized for edge devices such as smartphones and IoT devices.
2025-06-10
LLM
Input:
Free
Output:
Free
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-4B-FP8 is a lightweight model with 4 billion parameters launched by Alibaba. Its core purpose is to serve as an ultra-efficient edge inference engine, optimized specifically for edge devices such as smartphones and IoT devices, while balancing performance and low power consumption.

  • Lightweight Benchmark: With only 4 billion parameters, it delivers performance on par with Qwen2.5-72B (a 72-billion-parameter model), leading in accuracy among models of similar size.
  • FP8 Black Technology: Leveraging FP8 mixed-precision quantization, the model’s size is compressed to about 500 MB, boosting inference speed by three times and reducing power consumption by 60%.
  • Long-Text Support: Natively supports a 128K context length, making it easy to handle tasks such as long-document summarization and code-base analysis.
  • Multi-Language Coverage: Supports 119 languages (including Chinese dialects), meeting the needs of global applications.
  • Out-of-the-Box Compatibility: Compatible with mainstream inference frameworks such as ONNX Runtime and TensorRT, allowing deployment to Raspberry Pi or mobile devices within just 5 minutes.

───────────────────────────────────────────────────────────────────

Core Capabilities

⚡ Ultra-Fast Inference: FP8 quantization technology achieves “zero-loss” compression, delivering an inference speed of 25 tokens per second on Raspberry Pi 5 with latency below 50 ms.

📱 All-Purpose Edge Computing: The 4-billion-parameter model requires only 1.2 GB of memory to run on mobile devices, enabling real-time speech translation, offline summarization, and other scenarios—completely freeing users from cloud dependency.

🌐 Language Without Boundaries: Equipped with a built-in multilingual adaptation layer, it achieves over 92% accuracy in recognizing dialects such as Cantonese and Minnan, breaking down barriers for AI applications in minority languages.

🔧 Frictionless Ecosystem: Offers Android/iOS SDKs and lightweight Python APIs that can be called with just one line of code, perfectly compatible with cross-platform frameworks like Flutter and React Native.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContext302.AI Price

qwen/qwen3-4b-fp8

-
128000

InputFree
OutputFree
Original Price