TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GLM OCR

By Z.ai
GLM-OCR is an open-source document understanding model that combines OCR with layout-aware multimodal parsing. The repository describes it as using a CogViT visual encoder, a lightweight cross-modal connector, and a GLM-0.5B language decoder, plus Multi-Token Prediction and reinforcement learning to improve recognition accuracy and generalization. It is paired with a two-stage pipeline based on PP-DocLayout-V3 for layout analysis and parallel recognition, and the repo reports a 94.62 score on OmniDocBench V1.5. It supports deployment through vLLM, SGLang, and Ollama, with SDK and API options for production use.
New Multimodal Gen 3
Released: March 12, 2026

Overview

GLM-OCR is Z.aiโ€™s open-source multimodal OCR model for complex document understanding. Built on the GLM-V encoder-decoder architecture, it is designed to read and structure difficult real-world documents such as tables, formulas, code-heavy pages, and sealed or irregular layouts, while staying lightweight enough for efficient deployment at only 0.9B parameters.

About Z.ai

Z.ai (formerly Zhipu AI) is a Chinese AI company developing large language models (GLM series), combining reasoning, coding, and agent capabilities, and offering open models and APIs.

Industry: Artificial Intelligence
Company Size: 800
Location: Beijing, Beijing, CN
Website: chat.z.ai
View Company Profile

Tools using GLM OCR

  • Z.ai
    Free AI that builds, creates, and writes professionally.
    Open
    I have been using z.ai for two weeks for web development and it's mind blowing for me. What amaze me more is its capacity to understand my existing code with very little context and suggest -or just write- lot of improvement. I did some test to compare it with ChatGPT, Claude, etc. but the results were no even close. And I keep pushing the limits but it doesn't even blink.
Last updated: March 31, 2026
0 AIs selected
Clear selection
#
Name
Task