DeepAI AI Chat
Log In Sign Up

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

by   Jingqun Tang, et al.
Huazhong University of Science u0026 Technology
NetEase, Inc.
University of Rochester
Wuhan University
Hangzhou Dianzi University
NetEase, Inc

Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We observe that when using tight text bounding boxes as input, a text recognizer frequently fails to achieve optimal performance due to the inconsistency between bounding boxes and deep representations of text recognition. In this paper, we propose Box Adjuster, a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models. Additionally, when dealing with cross-domain problems such as synthetic-to-real, the proposed method significantly reduces mismatches in domain distribution between the source and target domains. Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training. Specifically, on several benchmark datasets for scene text understanding, the proposed method outperforms state-of-the-art text spotters by an average of 2.0 4.6


Text Detection on Roughly Placed Books by Leveraging a Learning-based Model Trained with Another Domain Data

Text detection enables us to extract rich information from images. In th...

Text Detection Recognition in the Wild for Robot Localization

Signage is everywhere and a robot should be able to take advantage of si...

Loss Guided Activation for Action Recognition in Still Images

One significant problem of deep-learning based human action recognition ...

Deep Cuboid Detection: Beyond 2D Bounding Boxes

We present a Deep Cuboid Detector which takes a consumer-quality RGB ima...

Learning Markov Clustering Networks for Scene Text Detection

A novel framework named Markov Clustering Network (MCN) is proposed for ...

Approximate Query Matching for Image Retrieval

Traditional image recognition involves identifying the key object in a p...

CHARTER: heatmap-based multi-type chart data extraction

The digital conversion of information stored in documents is a great sou...