Context Perception Parallel Decoder for Scene Text Recognition

07/23/2023
by   Yongkun Du, et al.
0

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based STR model uses the previously recognized characters to decode the next character iteratively. It shows superiority in terms of accuracy. However, the inference speed is slow also due to this iteration. Alternatively, parallel decoding (PD)-based STR model infers all the characters in a single decoding pass. It has advantages in terms of inference speed but worse accuracy, as it is difficult to build a robust recognition context in such a pass. In this paper, we first present an empirical study of AR decoding in STR. In addition to constructing a new AR model with the top accuracy, we find out that the success of AR decoder lies also in providing guidance on visual context perception rather than language modeling as claimed in existing studies. As a consequence, we propose Context Perception Parallel Decoder (CPPD) to decode the character sequence in a single PD pass. CPPD devises a character counting module and a character ordering module. Given a text instance, the former infers the occurrence count of each character, while the latter deduces the character reading order and placeholders. Together with the character prediction task, they construct a context that robustly tells what the character sequence is and where the characters appear, well mimicking the context conveyed by AR decoding. Experiments on both English and Chinese benchmarks demonstrate that CPPD models achieve highly competitive accuracy. Moreover, they run approximately 7x faster than their AR counterparts, and are also among the fastest recognizers. The code will be released soon.

READ FULL TEXT

page 3

page 4

page 8

page 11

page 12

research
04/30/2022

SVTR: Scene Text Recognition with a Single Visual Model

Dominant scene text recognition models commonly contain two building blo...
research
03/30/2023

TreePiece: Faster Semantic Parsing via Tree Tokenization

Autoregressive (AR) encoder-decoder neural networks have proved successf...
research
07/14/2022

Scene Text Recognition with Permuted Autoregressive Sequence Models

Context-aware STR methods typically use internal autoregressive (AR) lan...
research
05/18/2021

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

Leveraging the advances of natural language processing, most recent scen...
research
07/15/2020

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

The attention-based encoder-decoder framework has recently achieved impr...
research
11/22/2021

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The attention-based encoder-decoder framework is becoming popular in sce...
research
10/05/2022

Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior

Scene text recognition (STR) on Latin datasets has been extensively stud...

Please sign up or login with your details

Forgot password? Click here to reset