Sequence to sequence learning for unconstrained scene text recognition

In this work we present a state-of-the-art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network (CNN) architecture followed by a long short term memory model (LSTM). The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases (confusing characters due to visual similarities like "g" and "9", or confusing background patches with characters; either removing existing characters or adding non-existing ones) To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state-of-the-art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches [11, 18]. We finally present an application of the work in the domain of for traffic monitoring.

READ FULL TEXT

page 12

page 22

page 31

page 32

page 34

page 40

research
01/21/2016

Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs

In this work, we tackle the problem of car license plate detection and r...
research
12/30/2019

Text Steganalysis with Attentional LSTM-CNN

With the rapid development of Natural Language Processing (NLP) technolo...
research
12/18/2014

Deep Structured Output Learning for Unconstrained Text Recognition

We develop a representation suitable for the unconstrained recognition o...
research
09/14/2020

Adaptive Text Recognition through Visual Matching

In this work, our objective is to address the problems of generalization...
research
10/12/2021

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

In contrast to Connectionist Temporal Classification (CTC) approaches, S...
research
05/23/2018

Implicit Language Model in LSTM for OCR

Neural networks have become the technique of choice for OCR, but many as...
research
10/16/2014

Improve CAPTCHA's Security Using Gaussian Blur Filter

Providing security for webservers against unwanted and automated registr...

Please sign up or login with your details

Forgot password? Click here to reset