
zai-org/glm-4.6v
API Overview
GLM-4.6V is a flagship multimodal visual understanding model launched by Zhipu AI. Its core focus is on efficient visual reasoning, and it boasts powerful capabilities such as deep thinking and streaming output.
- Clear Positioning: Supports multiple types of input and output, with a context window of up to 128K tokens, and is dedicated to flagship-level visual reasoning.
- Diverse Capabilities: Features core abilities including deep thinking, visual understanding, and streaming output.
- Rich Scenarios: Suitable for complex tasks such as image understanding, video understanding, and document question-answering.
- Technological Leadership: Natively supports multimodal tool calls and achieves state-of-the-art performance on multimodal evaluation benchmarks.
───────────────────────────────────────────────────────────────────
Core Capabilities
🔍 Visual Understanding: Supports multiple types of input, accurately identifies content, attributes, and scenes, and can be used for tasks such as invoice OCR.
🤔 Deep Thinking: The thinking mode can be flexibly enabled or disabled, enhancing the reasoning and analytical capabilities for handling complex tasks.
💬 Streaming Output: Generates responses in real time, optimizing the user interaction experience in scenarios such as dialogue systems.
🛠 Function Call: Natively supports tool calls and integrates external tools, enabling features such as mixed-text-and-image output.
⚡ Context Caching: Smart caching optimizes performance for long conversations, supporting efficient processing of long texts and videos.
Playground
Log in to explore more features! Click to Log In