ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

by   Fangneng Zhan, et al.

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary variation of text appearances in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed which employs a novel line-fitting transformation to estimate the pose of text lines in scenes. In addition, an iterative rectification pipeline is developed where scene text distortions are corrected iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.


page 1

page 2

page 7


A pooling based scene text proposal technique for scene text reading in the wild

Automatic reading texts in scenes has attracted increasing interest in r...

MSR: Multi-Scale Shape Regression for Scene Text Detection

State-of-the-art scene text detection techniques predict quadrilateral b...

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

Arbitrary text appearance poses a great challenge in scene text recognit...

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

The requirement of large amounts of annotated images has become one gran...

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

This paper presents a scene text detection technique that exploits boots...

Scones: Towards Conversational Authoring of Sketches

Iteratively refining and critiquing sketches are crucial steps to develo...

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

Although recent works based on deep learning have made progress in impro...

Please sign up or login with your details

Forgot password? Click here to reset