ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

11/19/2022
by   Shancheng Fang, et al.
0

Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.

READ FULL TEXT

page 4

page 8

page 9

page 11

page 12

page 16

page 22

page 23

research
03/11/2021

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Linguistic knowledge is of great benefit to scene text recognition. Howe...
research
09/02/2022

Vision-Language Adaptive Mutual Decoder for OOV-STR

Recent works have shown huge success of deep learning models for common ...
research
08/13/2021

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

Although recent works based on deep learning have made progress in impro...
research
09/08/2022

Multi-Granularity Prediction for Scene Text Recognition

Scene text recognition (STR) has been an active research topic in comput...
research
05/30/2023

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

In this work we build upon negative results from an attempt at language ...
research
04/06/2022

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

Scene text recognition (STR) is a challenging problem due to the imperfe...
research
07/31/2020

A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition

Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist...

Please sign up or login with your details

Forgot password? Click here to reset