THUDM/GLM-4.1V-9B-Thinking

THUDM/GLM-4.1V-9B-Thinking

THUDM, a leading AI company, has launched a high-performance vision-language model that supports chain-of-thought reasoning.
2025-07-22
LLM
Model capability: imageModel capability: thinking
Pricing:
Free for a limited time
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-4.1V-9B-Thinking is an inference-first multimodal large model released by Zhipu AI (THUDM). Built on the GLM-4-9B architecture, it has approximately 9 billion parameters and is primarily positioned as a high-performance vision-language model that supports chain-of-thought reasoning. It achieves state-of-the-art performance among 10-billion-parameter VLMs, even surpassing its 72-billion-parameter competitors.

  • Breakthrough in Reasoning Capabilities: For the first time in the GLM-V series, we introduce the “Thinking Paradigm,” enhancing complex task reasoning through reinforcement learning. In 28 benchmarks, it outperforms 10-billion-parameter models in 23 of them.
  • Performance Surpassing Previous Models: On 18 tasks, it surpasses Qwen2.5-VL-72B, demonstrating an efficient design that delivers “great intelligence from a small model.”
  • Ultra-Long Context Support: It supports context lengths up to 64K, making it well-suited for complex scenarios such as long documents and multi-turn image-and-text dialogues.
  • High-Resolution Compatibility: It accepts images with arbitrary aspect ratios, supporting resolutions up to 4K for more precise detail capture.
  • Bilingual Open Source and Open: It offers bilingual Chinese-English understanding and generation capabilities and is open-sourced under the Apache License (the base version, GLM-4.1V-9B-Base, is also available).

───────────────────────────────────────────────────────────────────

Core Capabilities

🧠 Chain-of-Thought Reasoning: Our exclusive “Thinking Mode” significantly improves the accuracy, logic, and interpretability of answers.

📊 SOTA Multimodal Performance: Among 10-billion-parameter models, it ranks first in 23 benchmarks and outperforms 72-billion-parameter models in 18 benchmarks.

🖼️ 4K High-Resolution Understanding: It supports images of any aspect ratio, with maximum input resolution up to 4K, enabling precise interpretation of charts, documents, and scenes.

📚 64K Long Context: It easily handles mixed inputs of multiple images and long texts, making it ideal for applications in education, research, customer service, and more.

🇨🇳 Natively Supports Chinese and English: Optimized specifically for Chinese-language scenarios while maintaining strong English comprehension capabilities.

🔓 Open Source and Commercially Usable: Both the base model and the inference model are open-source, supporting both research and industrial-scale deployment.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SiliconFlow)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContext302.AI Price

THUDM/GLM-4.1V-9B-Thinking

Free for a limited time
64000

Free for a limited time