DeepAI
Log In Sign Up

Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

11/26/2021
by   Yi-Chang Chen, et al.
0

Scene text recognition (STR) has been widely studied in academia and industry. Training a text recognition model often requires a large amount of labeled data, but data labeling can be difficult, expensive, or time-consuming, especially for Traditional Chinese text recognition. To the best of our knowledge, public datasets for Traditional Chinese text recognition are lacking. This paper presents a framework for a Traditional Chinese synthetic data engine which aims to improve text recognition model performance. We generated over 20 million synthetic data and collected over 7,000 manually labeled data TC-STR 7k-word as the benchmark. Experimental results show that a text recognition model can achieve much better accuracy either by training from scratch with our generated synthetic data or by further fine-tuning with TC-STR 7k-word.

READ FULL TEXT
04/07/2016

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Scene text recognition plays an important role in many computer vision a...
07/29/2021

Why You Should Try the Real Data for the Scene Text Recognition

Recent works in the text recognition area have pushed forward the recogn...
04/16/2022

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

Scene text recognition (STR) attracts much attention over the years beca...
11/17/2018

Unnamed Entity Recognition of Sense Mentions

We consider the problem of recognizing mentions of human senses in text....
05/10/2022

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection

Recent scene text detection methods are almost based on deep learning an...
11/25/2022

Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images

Line Chart Data Extraction is a natural extension of Optical Character R...
01/10/2022

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Scene-text recognition is remarkably better in Latin languages than the ...

Code Repositories

traditional-chinese-text-recogn-dataset

繁體中文文字識別數據集


view repo