
GLM-OCR Layout analysis
The layout analysis model released by Zhipu is used to parse the layout of documents and images and extract text content.
2026-02-03
Pricing:
Bulk order? Contact your manager for exclusive deals
API Overview
GLM-OCR is a lightweight, professional OCR model released by Zhipu. Its core positioning is as a “small-size, high-precision professional document parsing engine,” designed to achieve understanding and extraction of complex document content through efficient and accurate text recognition technology.
- SOTA Performance: At the time of release, it topped OmniDocBench V1.5 with a score of 94.62, achieving state-of-the-art performance on multiple mainstream document understanding benchmarks—including tables and formulas—and approaching the performance of Gemini-3-Pro.
- Optimized for Real-World Business Scenarios: It has been optimized for complex scenarios such as code documents, intricate tables, and seals, maintaining outstanding recognition accuracy even in cases of complicated layouts, diverse fonts, or mixed text-and-image arrangements.
- High Efficiency and Cost-Effectiveness: With a parameter size of only 0.9B, it supports deployment via VLLM and SGLang, offering low inference latency and minimal computational overhead, at a cost roughly one-tenth that of traditional OCR solutions.
- Multi-Language Support: It supports multiple languages, including Chinese and English, making it suitable for global users.
- Batch Processing and RAG Support: It enables large-scale document recognition and parsing, providing a solid foundation for Retrieval-Augmented Generation (RAG).
───────────────────────────────────────────────────────────────────
Core Capabilities
🔍 Precise Structured Output:
- Returns JSON data conforming to predefined formats, ensuring clear structure for easy subsequent processing and integration.
📄 High-Precision Document Parsing:
- Can recognize special characters such as handwritten text, seals, and code, and intelligently extract key fields from various types of cards, receipts, and forms.
📊 Complex Table Parsing:
- Accurately understands complex table structures, including merged cells and multi-level headers, directly outputting HTML code without the need for secondary table formatting, greatly improving efficiency in table entry and conversion.
🚀 Quick Deployment Experience:
- Supports PDF and image inputs (JPG, PNG), with individual images ≤ 10 MB and PDFs ≤ 50 MB, and can handle up to 100 pages.
- Offers rich output modalities, including text, image links, and Markdown documents, meeting diverse user needs.
🌐Multiple Application Scenarios
- General Text Recognition: Applicable in education, research, office settings, and other fields, supporting various document input formats such as photos, screenshots, and scans.
- Complex Table Parsing: Suitable for industries like finance and insurance, handling table data with complex structures.
- Information Structuring Extraction: Used in systems across banking, government services, logistics, and other sectors, automatically extracting and standardizing key information from documents.
───────────────────────────────────────────────────────────────────
Performance Showcase
API Console
Log in to explore more features! Click to Log In
API Analytics
API Reference (1)
API Pricing
$¥ 円 ₽