Log In Sign Up

Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

by   Yi-Chang Chen, et al.

Scene text recognition (STR) has been widely studied in academia and industry. Training a text recognition model often requires a large amount of labeled data, but data labeling can be difficult, expensive, or time-consuming, especially for Traditional Chinese text recognition. To the best of our knowledge, public datasets for Traditional Chinese text recognition are lacking. This paper presents a framework for a Traditional Chinese synthetic data engine which aims to improve text recognition model performance. We generated over 20 million synthetic data and collected over 7,000 manually labeled data TC-STR 7k-word as the benchmark. Experimental results show that a text recognition model can achieve much better accuracy either by training from scratch with our generated synthetic data or by further fine-tuning with TC-STR 7k-word.


A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Scene text recognition plays an important role in many computer vision a...

Why You Should Try the Real Data for the Scene Text Recognition

Recent works in the text recognition area have pushed forward the recogn...

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

Scene text recognition (STR) attracts much attention over the years beca...

Unnamed Entity Recognition of Sense Mentions

We consider the problem of recognizing mentions of human senses in text....

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection

Recent scene text detection methods are almost based on deep learning an...

Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images

Line Chart Data Extraction is a natural extension of Optical Character R...

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Scene-text recognition is remarkably better in Latin languages than the ...

Code Repositories



view repo