Reading Scene Text with Attention Convolutional Sequence Modeling

09/13/2017
by   Yunze Gao, et al.
0

Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the representation of foreground text and suppress the background noise, we incorporate the residual attention modules into a small densely connected network to improve the discriminability of CNN features. We validate the performance of our approach on the standard benchmarks, including the Street View Text, IIIT5K and ICDAR datasets. As a result, state-of-the-art or highly-competitive performance and efficiency show the superiority of the proposed approach.

READ FULL TEXT
research
01/06/2016

Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition

Text recognition in natural scene is a challenging problem due to the ma...
research
06/05/2022

3D Convolutional with Attention for Action Recognition

Human action recognition is one of the challenging tasks in computer vis...
research
06/02/2018

SCAN: Sliding Convolutional Attention Network for Scene Text Recognition

Scene text recognition has drawn great attentions in the community of co...
research
12/01/2021

On-Device Spatial Attention based Sequence Learning Approach for Scene Text Script Identification

Automatic identification of script is an essential component of a multil...
research
09/29/2020

Lip-reading with Densely Connected Temporal Convolutional Networks

In this work, we present the Densely Connected Temporal Convolutional Ne...
research
06/14/2015

Reading Scene Text in Deep Convolutional Sequences

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text ...
research
01/21/2016

Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs

In this work, we tackle the problem of car license plate detection and r...

Please sign up or login with your details

Forgot password? Click here to reset