Revisiting Scene Text Recognition: A Data Perspective

07/17/2023
by   Qing Jiang, et al.
0

This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91 13 representative models. While these results are impressive and suggest that STR could be considered solved, however, we argue that this is primarily due to the less challenging nature of the common benchmarks, thus concealing the underlying issues that STR faces. To this end, we consolidate a large-scale real STR dataset, namely Union14M, which comprises 4 million labeled images and 10 million unlabeled images, to assess the performance of STR models in more complex real-world scenarios. Our experiments demonstrate that the 13 models can only achieve an average accuracy of 66.53 indicating that STR still faces numerous challenges in the real world. By analyzing the error patterns of the 13 models, we identify seven open challenges in STR and develop a challenge-driven benchmark consisting of eight distinct subsets to facilitate further progress in the field. Our exploration demonstrates that STR is far from being solved and leveraging data may be a promising solution. In this regard, we find that utilizing the 10 million unlabeled images through self-supervised pre-training can significantly improve the robustness of STR model in real-world scenarios and leads to state-of-the-art performance.

READ FULL TEXT

page 2

page 5

page 8

page 12

page 13

page 14

page 17

page 18

research
12/11/2022

SEPT: Towards Scalable and Efficient Visual Pre-Training

Recently, the self-supervised pre-training paradigm has shown great pote...
research
07/23/2022

Progressive Scene Text Erasing with Self-Supervision

Scene text erasing seeks to erase text contents from scene images and cu...
research
06/21/2021

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Aiming at facilitating a real-world, ever-evolving and scalable autonomo...
research
11/01/2022

Self-supervised Character-to-Character Distillation

Handling complicated text images (e.g., irregular structures, low resolu...
research
04/16/2022

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

Scene text recognition (STR) attracts much attention over the years beca...
research
08/30/2019

Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition

Reading text from natural images is challenging due to the great variety...
research
04/24/2018

SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach

The SimpleQuestions dataset is one of the most commonly used benchmarks ...

Please sign up or login with your details

Forgot password? Click here to reset