
qwen-vl-plus-2025-07-10
Image recognition model from Alibaba Qwen
2025-07-10
Input:
$0.022/1M tokens
Output:
$0.22/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Qwen VL-Plus (qwen-vl-plus) is the enhanced version of Qwen, a large-scale visual language model. It significantly improves both detail recognition and text recognition capabilities, supporting images with resolutions exceeding one million pixels and featuring arbitrary aspect ratios. The model delivers outstanding performance across a wide range of visual tasks.
Application Scenarios
- Image Question Answering: Describe the content in an image or classify and tag it—for example, identifying people, locations, flowers, birds, fish, and other creatures.
- Solving Math Problems: Provide solutions to math problems depicted in images, suitable for students at primary, secondary, university levels, as well as adult education programs.
- Video Understanding: Analyze video content, such as pinpointing specific events and extracting timestamps, or generating summaries of key time periods.
- Object Localization: Locate objects within an image and return the coordinates of the top-left and bottom-right corners of their bounding boxes, or the coordinates of their center points.
- Document Parsing: Convert image-based documents (such as scanned copies or image-based PDFs) into QwenVL HTML format. This format not only accurately recognizes text but also captures the positional information of elements like images and tables.
- Text Recognition and Information Extraction: Identify text and mathematical formulas within images, or extract information from documents like receipts, IDs, and forms, with support for formatted text output. Supported languages include Chinese, English, Japanese, Korean, Arabic, Vietnamese, French, German, Italian, Spanish, and Russian.
Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (2)
API Pricing
$¥ 円 ₽