Overview
HunyuanOCR is Tencent Hunyuan’s 1B parameter end-to-end OCR expert VLM. It reads documents, screenshots, and video frames, handling text detection, recognition, layout parsing, information extraction, subtitles, and photo translation in one shot, with strong multilingual support and state-of-the-art accuracy.
Description
Instead of a multi-stage pipeline, HunyuanOCR uses a single-prompt, single-inference flow that covers detection, recognition, layout understanding, translation, and structured outputs (JSON, HTML, LaTeX, Markdown, coordinates) in one go, which cuts latency and avoids error accumulation. Its multilingual design supports 100+ languages across documents, street views, tickets, handwriting, and more, making it suitable for large scale document automation, subtitle and photo translation, and multimodal RAG where accurate, structured OCR is the core.
About Tencent
Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.
