Tencent's Hunyuan Open Source Hunyuan OCR Model: 1B of parameters achieves multiple state-of-the-art (SOTA) certifications, empowering OCR applications across multiple scenarios.
On November 25th, Tencent Hunyuan officially announced the open-sourcing of its new OCR model, HunyuanOCR. This model has only 1 billion (1B) parameters and is built on Hunyuan's native multimodal architecture. It has achieved state-of-the- art (SOTA) results in multiple industry OCR application rankings, providing a lightweight and efficient solution for the implementation of OCR technology.
HunyuanOCR adopts a fully end-to-end paradigm design, consisting of three parts: a native resolution video encoder, an adaptive visual adapter, and a lightweight mixed-language model. Its core advantage lies in its "high efficiency and convenience": its small size makes it easy to deploy, and it can achieve optimal output with a single forward inference, far exceeding the efficiency of industry cascaded solutions .
In terms of performance, HunyuanOCR shines. In the OmniDocBench benchmark for complex document parsing, it scored 94.1, surpassing leading models like Google Gemini3-Pro. In its self-built benchmark tests covering nine scenarios including documents, handwriting, and street view, its text detection and recognition capabilities significantly outperform similar open-source and commercial models. On the OCR Bench leaderboard, it achieved state-of-the-art performance with only 1B of parameters (below 3B total parameters), achieving a total score of 860. In the field of minority language translation, the model supports mutual translation between 14 high-frequency minority languages and Chinese/English, and also won the championship in the small model track of the ICDAR2025 end-to-end document translation competition.
In terms of application scenarios, HunyuanOCR can perform functions such as parsing complex documents in multiple languages, extracting JSON format data from invoice fields, and automatically extracting bilingual subtitles from videos, covering fields such as card and document processing, video creation, and cross-border communication. Currently, users can download and experience it through web and mobile links or the GitHub and Hugging Face open-source addresses. A quick trial can be obtained by directly accessing the Hugging Face space.
address:
https://hunyuan.tencent.com/vision/zh?tabIndex=0
https://github.com/Tencent-Hunyuan/HunyuanOCR