DeepAI AI Chat
Log In Sign Up

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

07/26/2021
by   Ayan Kumar Bhunia, et al.
0

Although text recognition has significantly evolved over the years, state-of-the-art (SOTA) models still struggle in the wild scenarios due to complex backgrounds, varying fonts, uncontrolled illuminations, distortions and other artefacts. This is because such models solely depend on visual information for text recognition, thus lacking semantic reasoning capabilities. In this paper, we argue that semantic information offers a complementary role in addition to visual only. More specifically, we additionally utilize semantic information by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning. Our novelty lies in the intuition that for text recognition, the prediction should be refined in a stage-wise manner. Therefore our key contribution is in designing a stage-wise unrolling attentional decoder where non-differentiability, invoked by discretely predicted character labels, needs to be bypassed for end-to-end training. While the first stage predicts using visual features, subsequent stages refine on top of it using joint visual-semantic information. Additionally, we introduce multi-scale 2D attention along with dense and residual connections between different stages to deal with varying scales of character sizes, for better performance and faster convergence during training. Experimental results show our approach to outperform existing SOTA methods by a considerable margin.

READ FULL TEXT

page 4

page 6

03/27/2020

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Scene text image contains two levels of contents: visual texture and sem...
11/24/2021

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Semantic information has been proved effective in scene text recognition...
08/26/2019

SPGNet: Semantic Prediction Guidance for Scene Parsing

Multi-scale context module and single-stage encoder-decoder structure ar...
12/15/2022

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information

Handwritten Text Recognition (HTR) is more interesting and challenging t...
03/13/2019

Visual Semantic Information Pursuit: A Survey

Visual semantic information comprises two important parts: the meaning o...
04/18/2021

Multi-scale Self-calibrated Network for Image Light Source Transfer

Image light source transfer (LLST), as the most challenging task in the ...
01/17/2022

Accessibility and Trajectory-Based Text Characterization

Several complex systems are characterized by presenting intricate charac...