TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GLM OCR

By Z.ai
GLM-OCR is an open-source document understanding model that combines OCR with layout-aware multimodal parsing. The repository describes it as using a CogViT visual encoder, a lightweight cross-modal connector, and a GLM-0.5B language decoder, plus Multi-Token Prediction and reinforcement learning to improve recognition accuracy and generalization. It is paired with a two-stage pipeline based on PP-DocLayout-V3 for layout analysis and parallel recognition, and the repo reports a 94.62 score on OmniDocBench V1.5. It supports deployment through vLLM, SGLang, and Ollama, with SDK and API options for production use.
New Multimodal Gen 3
Released: March 12, 2026

Overview

GLM-OCR is Z.aiโ€™s open-source multimodal OCR model for complex document understanding. Built on the GLM-V encoder-decoder architecture, it is designed to read and structure difficult real-world documents such as tables, formulas, code-heavy pages, and sealed or irregular layouts, while staying lightweight enough for efficient deployment at only 0.9B parameters.

About Z.ai

Z.ai (formerly Zhipu AI) is a Chinese AI company developing large language models (GLM series), combining reasoning, coding, and agent capabilities, and offering open models and APIs.

Industry: Artificial Intelligence
Company Size: 500
Location: Beijing, Beijing, CN
Website: chat.z.ai
View Company Profile

Tools using GLM OCR

Last updated: March 31, 2026
0 AIs selected
Clear selection
#
Name
Task