PP-OCR: A Practical Ultra Lightweight OCR System

09/21/2020
by   Yuning Du, et al.
12

The Optical Character Recognition (OCR) systems have been widely used in various of application scenarios, such as office automation (OA) systems, factory automations, online educations, map productions etc. However, OCR is still a challenging task due to the various of text appearances and the demand of computational efficiency. In this paper, we propose a practical ultra lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols, respectively. We introduce a bag of strategies to either enhance the model ability or reduce the model size. The corresponding ablation experiments with the real data are also provided. Meanwhile, several pre-trained models for the Chinese and English recognition are released, including a text detector (97K images are used), a direction classifier (600K images are used) as well as a text recognizer (17.9M images are used). Besides, the proposed PP-OCR are also verified in several other language recognition tasks, including French, Korean, Japanese and German. All of the above mentioned models are open-sourced and the codes are available in the GitHub repository, i.e., https://github.com/PaddlePaddle/PaddleOCR.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 11

page 13

research
09/07/2021

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Optical Character Recognition (OCR) systems have been widely used in var...
research
06/07/2022

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Optical character recognition (OCR) technology has been widely used in v...
research
02/28/2018

Chinese Text in the Wild

We introduce Chinese Text in the Wild, a very large dataset of Chinese t...
research
10/25/2021

Ultra Light OCR Competition Technical Report

Ultra Light OCR Competition is a Chinese scene text recognition competit...
research
12/30/2021

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

The flourishing blossom of deep learning has witnessed the rapid develop...
research
08/04/2023

Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition

Optical Character Recognition (OCR) enables automatic text extraction fr...
research
06/06/2020

A Robust Attentional Framework for License Plate Recognition in the Wild

Recognizing car license plates in natural scene images is an important y...

Please sign up or login with your details

Forgot password? Click here to reset