qwen3-vl-32b-thinking

qwen3-vl-32b-thinking

The inference version of the largest Dense model in the Qwen3-VL series, with multimodal reasoning capabilities second only to Qwen3-VL-235B-Thinking.
2025-10-22
LLM
Model capability: imageModel capability: function_call
Input:
$0.29/1M tokens
Output:
$2.86/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-VL-32B-Thinking is an efficient multimodal reasoning model launched by Alibaba’s Tongyi Lab. Its core positioning is as a “lightweight visual-language deep-thinking engine,” specifically designed for image-and-text joint tasks that require step-by-step reasoning.

  • Dense Architecture + Thinking Mode: With 32 billion full parameters activated, it natively supports Chain-of-Thought reasoning while maintaining stable inference performance, automatically breaking down complex visual problems into manageable steps.
  • Ultra-long 128K Multimodal Context: It supports mixed inputs of images, videos, PDFs, and ultra-long texts, making it suitable for scenarios such as cross-page document analysis and multi-turn image-and-text dialogues.
  • High-Precision Visual Understanding: It can accurately identify interface elements, chart data, handwritten formulas, product labels, and more, and then correlate them semantically to perform logical inferences.
  • Ready for Tool Collaboration: It supports invoking code interpreters, calculators, or search modules to validate intermediate results, ensuring the reliability of the final output.

───────────────────────────────────────────────────────────────────

Core Capabilities

🧠 Autonomous Step-by-Step Visual Reasoning: When faced with tasks such as “calculating year-on-year growth rate from financial report screenshots and generating an analysis paragraph,” it can sequentially execute OCR → extraction → calculation → summarization.

👁️ Pixel-Level Semantic Association: It not only recognizes “there’s a bar chart in the image” but also understands “the blue bars represent Q3 revenue, which is higher than the red bars (Q2).”

🧩 Agent-Friendly Output: It can generate structured JSON or natural language explanations, seamlessly integrating into AI workflows such as GUI automation, educational tutoring, and data analysis.

Efficient Local Deployment: It can be smoothly deployed on a single RTX 4090 or Mac Studio, striking a balance between performance and cost, making it ideal for edge-side multimodal applications.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(Qwen2.5)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qwen3-vl-32b-thinking

-
128000

Input$0.29 / 1M tokens
Output$2.86 / 1M tokens

Input$0.29/ 1M tokens
Output$2.86/ 1M tokens
Original Price