Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

07/26/2021
by   Ayan Kumar Bhunia, et al.
0

Although text recognition has significantly evolved over the years, state-of-the-art (SOTA) models still struggle in the wild scenarios due to complex backgrounds, varying fonts, uncontrolled illuminations, distortions and other artefacts. This is because such models solely depend on visual information for text recognition, thus lacking semantic reasoning capabilities. In this paper, we argue that semantic information offers a complementary role in addition to visual only. More specifically, we additionally utilize semantic information by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning. Our novelty lies in the intuition that for text recognition, the prediction should be refined in a stage-wise manner. Therefore our key contribution is in designing a stage-wise unrolling attentional decoder where non-differentiability, invoked by discretely predicted character labels, needs to be bypassed for end-to-end training. While the first stage predicts using visual features, subsequent stages refine on top of it using joint visual-semantic information. Additionally, we introduce multi-scale 2D attention along with dense and residual connections between different stages to deal with varying scales of character sizes, for better performance and faster convergence during training. Experimental results show our approach to outperform existing SOTA methods by a considerable margin.

READ FULL TEXT

page 4

page 6

research
03/27/2020

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Scene text image contains two levels of contents: visual texture and sem...
research
11/24/2021

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Semantic information has been proved effective in scene text recognition...
research
08/26/2019

SPGNet: Semantic Prediction Guidance for Scene Parsing

Multi-scale context module and single-stage encoder-decoder structure ar...
research
12/15/2022

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information

Handwritten Text Recognition (HTR) is more interesting and challenging t...
research
03/13/2019

Visual Semantic Information Pursuit: A Survey

Visual semantic information comprises two important parts: the meaning o...
research
07/19/2020

Character Region Attention For Text Spotting

A scene text spotter is composed of text detection and recognition modul...
research
05/23/2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Attention-based encoder-decoder (AED) models have shown impressive perfo...

Please sign up or login with your details

Forgot password? Click here to reset