Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

by   Ankan Kumar Bhunia, et al.

Script identification facilitates many important applications in document/video analysis. This paper focuses on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in such scenario is difficult. Most of the recent approaches usually apply a patch-based CNN network with summation of obtained features, or only a CNN-LSTM network to get the identification result. Some use a discriminative CNN to jointly optimize mid-level representations and deep features. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch. Experiments have been done in two public script identification datasets, SIW-13 and CVSI2015. Our learning procedure achieves superior performance compared with previous approaches.


page 2

page 16

page 21

page 23

page 24

page 25


Improving patch-based scene text script identification with ensembles of conjoined networks

This paper focuses on the problem of script identification in scene text...

On-Device Spatial Attention based Sequence Learning Approach for Scene Text Script Identification

Automatic identification of script is an essential component of a multil...

Patch Aggregator for Scene Text Script Identification

Script identification in the wild is of great importance in a multi-ling...

Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction Techniques for Text-Independent Writer Identification

Text independent writer identification is a challenging problem that dif...

GiT: Graph Interactive Transformer for Vehicle Re-identification

Transformers are more and more popular in computer vision, which treat a...

DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification

Text-independent writer identification is challenging due to the huge va...

Feature Fusion for Robust Patch Matching With Compact Binary Descriptors

This work addresses the problem of learning compact yet discriminative p...

Please sign up or login with your details

Forgot password? Click here to reset