qwen2-vl-2b-instruct

qwen2-vl-2b-instruct

From Tongyi Qianwen, with context expanded to 32k, enhanced image understanding capabilities, and improved ability to recognize multilingual text and handwriting in images.
2024-08-30
LLM
Model capability: image
Pricing:
Limited-time free
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen2-VL-2B-Instruct is a lightweight multimodal instruction-tuned model released by Alibaba’s Tongyi Lab. Its core mission is to serve as an edge-side visual-language assistant that delivers “efficient image-and-text understanding combined with low-resource deployment.”

  • Ultra-lightweight design: With only 2 billion parameters, it significantly reduces computational and memory overhead while maintaining essential multimodal capabilities.
  • Native multimodal support: It can directly process mixed inputs of images and text, making it suitable for common scenarios such as screenshot-based question answering, simple chart recognition, and product image description.
  • Long-context compatibility: It supports context lengths of up to 32,000 tokens (with some implementations supporting even longer), meeting the needs of basic image-and-text dialogues and short-document analysis.
  • Optimized for Chinese scenarios: It has been specially trained on Chinese interfaces, advertising images, social media photos, and other content, delivering outputs that better align with local user habits.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Basic visual understanding: It can identify main objects in images, text (combined with built-in OCR capabilities), simple layouts, and common UI elements.

🧠 Image-and-text collaborative response: It can complete tasks such as “Describe this food photo” or “Extract the phone number from the screenshot” based on given instructions, producing concise and accurate outputs.

Efficient local execution: It can perform smooth inference on consumer-grade devices like RTX 3060 and MacBook M-series, making it ideal for mobile or embedded applications.

🧩 Quick and easy integration: It provides a standard Transformers interface, making it easy to integrate into existing applications and use as a cost-effective multimodal module.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Tongyi Qianwen)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContext302.AI Price

qwen2-vl-2b-instruct

-
32000

Limited-time free