Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

11/26/2021
by   Yi-Chang Chen, et al.
0

Scene text recognition (STR) has been widely studied in academia and industry. Training a text recognition model often requires a large amount of labeled data, but data labeling can be difficult, expensive, or time-consuming, especially for Traditional Chinese text recognition. To the best of our knowledge, public datasets for Traditional Chinese text recognition are lacking. This paper presents a framework for a Traditional Chinese synthetic data engine which aims to improve text recognition model performance. We generated over 20 million synthetic data and collected over 7,000 manually labeled data TC-STR 7k-word as the benchmark. Experimental results show that a text recognition model can achieve much better accuracy either by training from scratch with our generated synthetic data or by further fine-tuning with TC-STR 7k-word.

READ FULL TEXT
research
04/07/2016

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Scene text recognition plays an important role in many computer vision a...
research
07/29/2021

Why You Should Try the Real Data for the Scene Text Recognition

Recent works in the text recognition area have pushed forward the recogn...
research
04/16/2022

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

Scene text recognition (STR) attracts much attention over the years beca...
research
05/09/2023

Novel Synthetic Data Tool for Data-Driven Cardboard Box Localization

Application of neural networks in industrial settings, such as automated...
research
11/17/2018

Unnamed Entity Recognition of Sense Mentions

We consider the problem of recognizing mentions of human senses in text....
research
01/10/2022

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Scene-text recognition is remarkably better in Latin languages than the ...
research
11/25/2022

Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images

Line Chart Data Extraction is a natural extension of Optical Character R...

Please sign up or login with your details

Forgot password? Click here to reset