glm-4v-plus

glm-4v-plus

High-performance version optimized based on GLM-4v
2024-08-29
LLM
Model capability: image
Input:
$1.4/1M tokens
Output:
$1.4/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-4V is the flagship multimodal model launched by Zhipu AI. Its core positioning is as a vision-understanding engine based on the MOE architecture, featuring high-resolution input of 1120×1120 and a deep-thinking mode to achieve precise analysis of images, videos, and documents, as well as cross-modal task processing.

  • Multimodal Fusion: Supports three types of inputs—images, videos, and files—and delivers accurate text-analysis results, covering enterprise-level scenarios such as front-end replication and security quality inspection.
  • High-Resolution Processing: Exclusively supports input resolutions of 1120×1120. By employing downsampling techniques, it reduces token overhead and enhances parsing accuracy.
  • Performance on Par with GPT-4V: Outperforms comparable open-source models in multiple benchmarks, achieving state-of-the-art (SOTA) overall performance, especially excelling in complex visual reasoning tasks.
  • Structured Output: Natively supports JSON format, enabling direct integration with business systems and reducing the need for secondary development.
  • Enterprise-Level Customization: Supports LoRA fine-tuning, increasing model availability from 60% to 89%.
  • Open-Source Ecosystem: The open-source version has been downloaded over 13 million times, ranking first among domestically developed models, and supports developers in deploying lightweight applications locally.

───────────────────────────────────────────────────────────────────

Core Capabilities

🧠 MOE Architecture Powered: With a total parameter count of 10.6B and 1.2B activation parameters, it achieves the highest performance among comparable open-source models, boosting visual reasoning efficiency by 40%.

🔍 Full-Modal Analysis: Exclusively supports three modalities—video, image, and file—as input, enabling multi-source information fusion and analysis in a single call.

⚡ Deep-Thinking Mode: Dynamically activates complex reasoning chains to tackle advanced tasks such as subject-specific problem-solving and logical deduction, with an accuracy rate exceeding 92%.

🌐 High-Resolution Processing: With an input resolution of 1120×1120 and intelligent downsampling, detail-capturing capability is enhanced by 50%, minimizing information loss.

🛠️ Automated Execution: A GUI Agent precisely identifies interface elements and automatically performs office operations such as PPT editing and data entry.

📊 Structured Output: Natively supports JSON format and includes coordinate-based localization (such as Grounding), directly generating interactive code or structured data.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Zhipu GLM-4V)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

glm-4v-plus

-
32000

Input$1.4 / 1M tokens
Output$1.4 / 1M tokens

Input$1.4/ 1M tokens
Output$1.4/ 1M tokens
Original Price