Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

12/30/2021
by   Jingye Chen, et al.
14

The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts. As another widely-spoken language, Chinese text recognition in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on Chinese text recognition to the lack of reasonable dataset construction standards, unified evaluation methods, and results of the existing baselines. To fill this gap, we manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets. Furthermore, we evaluate a series of representative text recognition methods on these datasets with unified evaluation methods to provide experimental results. By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios. We consider that there still remain numerous challenges under exploration due to the characteristics of Chinese texts, which are quite different from English texts. The code and datasets are made publicly available at https://github.com/FudanVI/benchmarking-chinese-text-recognition.

READ FULL TEXT

page 2

page 6

page 16

page 17

research
03/25/2019

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

In this paper, we introduce the ShopSign dataset, which is a newly devel...
research
04/07/2016

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Scene text recognition plays an important role in many computer vision a...
research
07/11/2022

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Recognizing irregular texts has been a challenging topic in text recogni...
research
09/03/2023

Orientation-Independent Chinese Text Recognition in Scene Images

Scene text recognition (STR) has attracted much attention due to its bro...
research
05/07/2022

Unified Chinese License Plate Detection and Recognition with High Efficiency

Recently, deep learning-based methods have reached an excellent performa...
research
07/07/2022

VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

The DarkWeb represents a hotbed for illicit activity, where users commun...
research
09/21/2020

PP-OCR: A Practical Ultra Lightweight OCR System

The Optical Character Recognition (OCR) systems have been widely used in...

Please sign up or login with your details

Forgot password? Click here to reset