NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

06/04/2018
by   Fenfen Sheng, et al.
0

Scene text recognition has attracted a great many researches for decades due to its importance to various applications. Existing sequence-to-sequence (seq2seq) recognizers mainly adopt Connectionist Temporal Classification (CTC) or Attention based Recurrent or Convolutional networks, and have made great progresses in scene text recognition. However, we observe that current methods suffer from slow training speed because the internal recurrence of RNNs limits training parallelization, and high complexity because of stacking too many convolutional layers in order to extract features over long input sequence. To tackle the above problems, this paper presents a no-recurrence seq2seq model, named NRTR, that relies only on the attention mechanism dispensing with recurrences and convolutions entirely. NRTR consists of an encoder that transforms an input sequence to the hidden feature representation, and a decoder that generates an output sequence of characters from the encoder output. Both of the encoder and the decoder are based on self-attention module to learn positional dependencies, which could be trained with more parallelization and less complexity. Besides, we also propose a modality-transform block to effectively transform the input image to the corresponding sequence, which could be used by the encoder directly. NRTR is end-to-end trainable and is not confined to any predefined lexicon. Extensive experiments on various benchmarks, including the IIIT5K, SVT and ICDAR datasets, show that NRTR achieves the state-of-the-art or highly-competitive performances in both lexicon-free and lexicon-based scene text recognition tasks, while requiring only one order of magnitude less time for model training compared to current methods.

READ FULL TEXT

page 1

page 7

research
10/07/2019

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Attention based scene text recognizers have gained huge success, which l...
research
11/04/2019

Scene Text Recognition with Temporal Convolutional Encoder

Texts from scene images typically consist of several characters and exhi...
research
05/09/2018

Edit Probability for Scene Text Recognition

We consider the scene text recognition problem under the attention-based...
research
04/15/2021

Rethinking Text Line Recognition Models

In this paper, we study the problem of text line recognition. Unlike mos...
research
08/26/2019

Adaptive Embedding Gate for Attention-Based Scene Text Recognition

Scene text recognition has attracted particular research interest becaus...
research
01/02/2023

Transformer Based Geocoding

In this paper, we formulate the problem of predicting a geolocation from...
research
07/21/2015

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Image-based sequence recognition has been a long-standing research topic...

Please sign up or login with your details

Forgot password? Click here to reset