GLM-OCR Layout analysis

GLM-OCR Layout analysis

The layout analysis model released by Zhipu is used to parse the layout of documents and images and extract text content.
2026-02-03
Data Processing
Pricing:
$0.03/M Tokens
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-OCR is a lightweight, professional OCR model released by Zhipu. Its core positioning is as a “small-size, high-precision professional document parsing engine,” designed to achieve understanding and extraction of complex document content through efficient and accurate text recognition technology.

  • SOTA Performance: At the time of release, it topped OmniDocBench V1.5 with a score of 94.62, achieving state-of-the-art performance on multiple mainstream document understanding benchmarks—including tables and formulas—and approaching the performance of Gemini-3-Pro.
  • Optimized for Real-World Business Scenarios: It has been optimized for complex scenarios such as code documents, intricate tables, and seals, maintaining outstanding recognition accuracy even in cases of complicated layouts, diverse fonts, or mixed text-and-image arrangements.
  • High Efficiency and Cost-Effectiveness: With a parameter size of only 0.9B, it supports deployment via VLLM and SGLang, offering low inference latency and minimal computational overhead, at a cost roughly one-tenth that of traditional OCR solutions.
  • Multi-Language Support: It supports multiple languages, including Chinese and English, making it suitable for global users.
  • Batch Processing and RAG Support: It enables large-scale document recognition and parsing, providing a solid foundation for Retrieval-Augmented Generation (RAG).

───────────────────────────────────────────────────────────────────

Core Capabilities

🔍 Precise Structured Output:

  • Returns JSON data conforming to predefined formats, ensuring clear structure for easy subsequent processing and integration.

📄 High-Precision Document Parsing:

  • Can recognize special characters such as handwritten text, seals, and code, and intelligently extract key fields from various types of cards, receipts, and forms.

📊 Complex Table Parsing:

  • Accurately understands complex table structures, including merged cells and multi-level headers, directly outputting HTML code without the need for secondary table formatting, greatly improving efficiency in table entry and conversion.

🚀 Quick Deployment Experience:

  • Supports PDF and image inputs (JPG, PNG), with individual images ≤ 10 MB and PDFs ≤ 50 MB, and can handle up to 100 pages.
  • Offers rich output modalities, including text, image links, and Markdown documents, meeting diverse user needs.

🌐Multiple Application Scenarios

  • General Text Recognition: Applicable in education, research, office settings, and other fields, supporting various document input formats such as photos, screenshots, and scans.
  • Complex Table Parsing: Suitable for industries like finance and insurance, handling table data with complex structures.
  • Information Structuring Extraction: Used in systems across banking, government services, logistics, and other sectors, automatically extracting and standardizing key information from documents.

───────────────────────────────────────────────────────────────────

Performance Showcase

API Console

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
glm-ocr
POST
Stable
View Details

API Pricing

$
ModelDescriptionContext302.AI Price

glm-ocr

-
32000

$0.03/M Tokens