TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

HunyuanOCR

By Tencent
New Text Gen 7
Released: November 25, 2025

Overview

HunyuanOCR is Tencent Hunyuan’s 1B parameter end-to-end OCR expert VLM. It reads documents, screenshots, and video frames, handling text detection, recognition, layout parsing, information extraction, subtitles, and photo translation in one shot, with strong multilingual support and state-of-the-art accuracy.

Description

HunyuanOCR is a lightweight, open source vision-language model built on Hunyuan’s native multimodal architecture and specialized entirely for OCR. With only about 1B parameters, it reaches state-of-the-art results on benchmarks like OCRBench and OmniDocBench, outperforming larger general VLMs and even commercial APIs on tasks such as text spotting, complex document parsing, open-field information extraction, subtitle extraction, and image translation.

Instead of a multi-stage pipeline, HunyuanOCR uses a single-prompt, single-inference flow that covers detection, recognition, layout understanding, translation, and structured outputs (JSON, HTML, LaTeX, Markdown, coordinates) in one go, which cuts latency and avoids error accumulation. Its multilingual design supports 100+ languages across documents, street views, tickets, handwriting, and more, making it suitable for large scale document automation, subtitle and photo translation, and multimodal RAG where accurate, structured OCR is the core.

About Tencent

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.

Website: tencent.com
View Company Profile

Related Models

Last updated: November 28, 2025