Qwen/Qwen3-VL-32B-Instruct

Qwen/Qwen3-VL-32B-Instruct

The non-inference version of the largest Dense model in the Qwen3-VL series, with overall performance second only to Qwen3-VL-235B-Instruct.
2025-10-21
LLM
Model capability: imageModel capability: function_call
Input:
$0.143/1M tokens
Output:
$0.572/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-VL-32B-Instruct is a high-performance multimodal instruction-tuned model released by Alibaba’s Tongyi Lab. Its core positioning is as a versatile visual-language powerhouse focused on “highly reliable image-and-text understanding + enterprise-grade multimodal interaction.”

  • Dense architecture for stable outputs: With 32 billion full parameters activated, it avoids the uncertainty introduced by MoE routing, delivering consistent and reliable performance in tasks such as visual question answering and document parsing.
  • Ultra-long 128K multimodal context: Natively supports mixed inputs of images, videos, PDFs, web page screenshots, and ultra-long texts, making it ideal for complex cross-modal scenarios.
  • Deeply optimized for instruction following: Fine-tuned with high-quality human preference data, it precisely responds to detailed instruction requirements regarding format, style, multiple constraints, and more.
  • Enhanced multilingual and specialized content understanding: Demonstrates outstanding comprehension capabilities in localized and professional contexts, including Chinese interfaces, technical charts, and illustrations in academic papers.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Precise image-and-text alignment: Accurately identifies price tags in product images, button positions in UI interfaces, key data in tables, and outputs results in a structured manner according to instructions.

🧠 Complex task execution: Supports end-to-end multimodal workflows such as “organizing to-do lists from meeting screenshots” and “generating operational steps from experimental flowcharts.”

🌍 Natural bilingual expression in Chinese and English: Produces outputs that align with local cultural contexts, making it suitable for global applications such as cross-border e-commerce, intelligent customer service, and educational tutoring.

🛡️ Ready for production environments: Supports content filtering, structured JSON output, and audit logs, meeting compliance deployment requirements in industries such as finance, government, and healthcare.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SiliconFlow)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

Qwen/Qwen3-VL-32B-Instruct

-
256000

Input$0.143 / 1M tokens
Output$0.572 / 1M tokens

Input$0.143/ 1M tokens
Output$0.572/ 1M tokens
Original Price