GLM-Image

GLM-Image

Zhipu’s flagship image generation model adopts a hybrid autoregressive + diffusion decoder architecture.
2026-01-14
Image Generations
Pricing:
$0.016/call
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-Image is a flagship image-generation model launched by Zhipu AI. Its core positioning is as a next-generation multimodal generative foundation that combines "cognitive generation: global instruction understanding + local detail refinement."

  • Innovative Hybrid Architecture: Adopting a hybrid architecture of a “9B autoregressive model + 7B DiT diffusion decoder,” it balances semantic understanding with high-frequency detail restoration, significantly improving the accuracy of text-to-image generation.
  • Leading Performance in Text-Dense Scenarios: Achieving state-of-the-art (SOTA) open-source results on the CVTG-2K and LongText-Bench benchmarks, with Chinese text accuracy at 0.9788 and English accuracy at 0.9524. It is ideal for knowledge-intensive generation tasks such as posters, PPTs, and科普 illustrations.
  • Domestically Developed Full-Stack Training: The model was entirely trained using Ascend Atlas 800T A2 chips and the MindSpore framework, making it the first SOTA multimodal generative model fully trained on domestically developed chips.
  • Flexible Resolution Support: Natively supports aspect ratios such as 1:1, 3:4, and 16:9, with image sizes ranging from 512×512 to 2048×2048 (must be integer multiples of 32), perfectly adapting to diverse display requirements across multiple platforms.

───────────────────────────────────────────────────────────────────

Core Capabilities

🖋️ Accurate Text Embedding: Renders Chinese and English text with precise typography and neatly drawn strokes, even in commercial posters, signage, and complex dialogue boxes.

📽️ Commercial Poster Expert: Possesses exceptional visual composition and hierarchical sense, making it ideal for generating visually appealing holiday posters, brand promotion images, and diverse social media content.

🔬 Science Illustration with Logical Precision: Can comprehend complex prompt logic and accurately draw annotated diagrams of scientific principles and flowcharts, ensuring both aesthetic appeal and effective knowledge transmission.

👥 High-Quality Realistic Portraits: Leveraging the detailed rendering capabilities of the DiT architecture, it can generate realistic portraits at a photographic level, featuring natural skin textures, nuanced light and shadow effects, and hair strands rendered with exquisite precision.

📖 Coherent Multi-Panel Creations: When generating e-commerce product displays or sequential story illustrations, it maintains consistent subject imagery while simultaneously optimizing details and text across multiple panels.

───────────────────────────────────────────────────────────────────

API Console

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
image(Text-to-Image Generation)
POST
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

glm-image

Text-to-Image Generation

$0.016/call