qwen3-vl-32b-instruct

qwen3-vl-32b-instruct

The non-inference version of the largest Dense model in the Qwen3-VL series, whose overall performance is second only to Qwen3-VL-235B-Instruct.
2025-10-22
LLM
Model capability: imageModel capability: function_call
Input:
$0.29/1M tokens
Output:
$1.143/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-VL-32B-Instruct is a high-performance multimodal instruction-tuned model released by Alibaba’s Tongyi Lab. Its core positioning is as a versatile visual-language powerhouse focused on “highly reliable image-and-text understanding + enterprise-grade multimodal interaction.”

  • Dense architecture for stable outputs: With 32 billion full parameters activated, it avoids the uncertainty inherent in MoE routing, delivering consistent and reliable performance in tasks such as visual question answering and document parsing.
  • Ultra-long 128K multimodal context: Natively supports mixed inputs of images, videos, PDFs, web screenshots, and ultra-long texts, making it ideal for complex cross-modal scenarios.
  • Deeply optimized for instruction following: Fine-tuned with high-quality human preference data, it precisely responds to detailed instruction requirements regarding format, style, and multiple constraints.
  • Enhanced multilingual and specialized content understanding: Demonstrates outstanding comprehension capabilities in localized and professional scenarios, including Chinese interfaces, technical charts, and illustrations in academic papers.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Precise image-and-text alignment: Accurately identifies price tags in product images, button positions in UI interfaces, key data in tables, and outputs results in a structured manner according to instructions.

🧠 Complex task execution: Supports end-to-end multimodal workflows such as “organizing to-do lists from meeting screenshots” and “generating operational steps from experimental flowcharts.”

🌍 Natural bilingual expression in Chinese and English: Produces outputs that align with local cultural contexts, making it suitable for global applications such as cross-border e-commerce, intelligent customer service, and educational tutoring.

🛡️ Ready for production environments: Supports content filtering, structured JSON output, and audit logs, meeting compliance deployment requirements in industries such as finance, government, and healthcare.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(Qwen2.5)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen3-vl-32b-instruct

-
128000

Input$0.29 / 1M tokens
Output$1.143 / 1M tokens

Input$0.29/ 1M tokens
Output$1.143/ 1M tokens
Original Price