HunyuanOCR | AI Model

Overview

HunyuanOCR is Tencent Hunyuan’s 1B parameter end-to-end OCR expert VLM. It reads documents, screenshots, and video frames, handling text detection, recognition, layout parsing, information extraction, subtitles, and photo translation in one shot, with strong multilingual support and state-of-the-art accuracy.

Description

HunyuanOCR is a lightweight, open source vision-language model built on Hunyuan’s native multimodal architecture and specialized entirely for OCR. With only about 1B parameters, it reaches state-of-the-art results on benchmarks like OCRBench and OmniDocBench, outperforming larger general VLMs and even commercial APIs on tasks such as text spotting, complex document parsing, open-field information extraction, subtitle extraction, and image translation.

Instead of a multi-stage pipeline, HunyuanOCR uses a single-prompt, single-inference flow that covers detection, recognition, layout understanding, translation, and structured outputs (JSON, HTML, LaTeX, Markdown, coordinates) in one go, which cuts latency and avoids error accumulation. Its multilingual design supports 100+ languages across documents, street views, tickets, handwriting, and more, making it suitable for large scale document automation, subtitle and photo translation, and multimodal RAG where accurate, structured OCR is the core.

About Tencent

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.

Website: tencent.com

View Company Profile

Related Models

Last updated: November 28, 2025

Overview

Description

About Tencent

Related Models

Claude (initial)

DeepSeek-R1

HyperCLOVA X SEED Vision

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool