qvq-max-2025-05-15

qvq-max-2025-05-15

Tongyi QVQ Visual Reasoning Model, supporting visual input and chain-of-thought output.
2025-05-15
LLM
Model capability: imageModel capability: video
Input:
$1.15/1M tokens
Output:
$4.58/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

QVQ-Max is a next-generation visual reasoning large model launched by Alibaba Tongyi, with a core positioning as a “multimodal all-rounder.” It’s specifically designed for deep image and video understanding, cross-modal reasoning, and creative generation.

  • Four Core Capabilities:
  • Image Analysis: Accurately identifies key elements in charts, paper illustrations, and product images within 0.3 seconds—leaving no detail behind, not even error bars on coordinate axes.
  • Video Analysis: Understands dynamic scenes and can infer subsequent plot developments or user intentions based on the current frame.
  • Deep Reasoning: Combines visual content with background knowledge to perform logical inference (e.g., extracting data from financial report screenshots and analyzing trends).
  • Creative Generation: Automatically generates e-commerce-focused short-video scripts from product images, covering the entire pipeline—from shot composition and camera movements to copywriting.
  • Technological Leadership: As the official upgraded version of QVQ-72B-Preview, it continuously sets new accuracy records on visual reasoning benchmarks such as MathVision.
  • Broad Applicability: Covers diverse needs across learning (solving math problems), work (data analysis, report interpretation), and daily life (styling advice, content creation).

───────────────────────────────────────────────────────────────────

Core Value

👁️ Breaking Through the “Retinal Limitation”: Liberates humans from tedious visual information filtering, enabling AI to complete understanding, reasoning, and output in a single “gaze.”

🧠 Synthetic Perception Across Text, Images, and Video: Whether it’s scientific charts, e-commerce interfaces, or short-video assets, everything can be modeled uniformly and linked across modalities seamlessly.

🚀 A Creative Acceleration Engine: From “seeing an image” to “generating a script”—the entire process is fully automated, dramatically shortening content production cycles.

📌 Project Homepage: https://qwenlm.github.io/blog/qvq-max-preview/

QVQ-Max is not just about “talking about images”—it’s a next-generation AI visual intelligence foundation that uses vision to drive decision-making and creativity.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (3)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Tongyi Qianwen)
POST
Stable
View Details
Chat (Tongyi Qianwen-VL)
POST
Stable
View Details
Chat(Tongyi Qianwen-OCR)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

qvq-max-2025-05-15

-
128000

Input$1.15 / 1M tokens
Output$4.58 / 1M tokens

Input$1.15/ 1M tokens
Output$4.58/ 1M tokens
Original Price