sophnet/Qwen3-VL-235B-A22B-Instruct

sophnet/Qwen3-VL-235B-A22B-Instruct

The flagship multimodal mixture-of-experts (MoE) model launched by the Tongyi Qianwen.
2025-09-25
LLM
Model capability: image
Input:
$0.286/1M tokens
Output:
$1.143/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen3-VL-235B-A22B-Instruct is a super-large-scale multimodal instruction-tuned model released by Alibaba’s Tongyi Lab, primarily positioned as the “most powerful open-source visual-language intelligence foundation model,” specifically designed for complex image-and-text understanding and tool-collaboration scenarios.

  • Flagship MoE Architecture: With a total of 235 billion parameters, it activates only 22 billion parameters, striking a balance between cutting-edge multimodal capabilities and efficient inference costs.
  • Natively Supports Long Videos and Documents: It can handle various types of inputs, including images, videos, PDFs, web page screenshots, and more, supporting ultra-long context fusion and analysis in a single inference.
  • Comprehensively Enhanced Agent Capabilities: It leads in tasks such as GUI operations, frontend code generation, and chart-based question answering, and supports Function Calling and structured output.
  • Built-in Deep Thinking Mode: It can automatically enable Chain-of-Thought reasoning, breaking down complex visual tasks into step-by-step subtasks and solving them sequentially.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Pixel-Level Semantic Understanding: Accurately identifies interface elements, chart data, document layouts, and correlates them with text instructions for high-level reasoning.

🧠 Autonomous Task Planning: When faced with tasks such as “writing e-commerce detail pages based on product images” or “generating analytical reports from financial statement screenshots,” it can automatically plan the process of parsing → extraction → generation.

🌍 Multi-Language Image and Text Generation: It supports image description, interpretation, and creation in multiple languages including Chinese and English, producing outputs that are culturally appropriate and contextually relevant.

🧩 Seamless Agent Collaboration: Natively compatible with tool-call protocols, it can directly drive browsers, code executors, or design software to build end-to-end automated workflows.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SophNet)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

sophnet/Qwen3-VL-235B-A22B-Instruct

-
128000

Input$0.286 / 1M tokens
Output$1.143 / 1M tokens

Input$0.286/ 1M tokens
Output$1.143/ 1M tokens
Original Price