AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

08/03/2020
by   Wenhai Wang, et al.
0

Scene text spotting aims to detect and recognize the entire word or sentence with multiple characters in natural images. It is still challenging because ambiguity often occurs when the spacing between characters is large or the characters are evenly spread in multiple rows and columns, making many visually plausible groupings of the characters (e.g. "BERLIN" is incorrectly detected as "BERL" and "IN" in Fig. 1(c)). Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection. The proposed AE TextSpotter has three important benefits. 1) The linguistic representation is learned together with the visual representation in a framework. To our knowledge, it is the first time to improve text detection by using a language model. 2) A carefully designed language module is utilized to reduce the detection confidence of incorrect text lines, making them easily pruned in the detection stage. 3) Extensive experiments show that AE TextSpotter outperforms other state-of-the-art methods by a large margin. For example, we carefully select a validation set of extremely ambiguous samples from the IC19-ReCTS dataset, where our approach surpasses other methods by more than 4 released at https://github.com/whai362/TDA-ReCTS.

READ FULL TEXT

page 2

page 14

page 19

page 20

page 21

research
08/22/2021

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

In this paper, we abandon the dominant complex language model and rethin...
research
05/18/2021

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

Leveraging the advances of natural language processing, most recent scen...
research
04/06/2022

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

Scene text recognition (STR) is a challenging problem due to the imperfe...
research
10/17/2019

Convolutional Character Networks

Recent progress has been made on developing a unified framework for join...
research
03/28/2022

vTTS: visual-text to speech

This paper proposes visual-text to speech (vTTS), a method for synthesiz...
research
10/05/2019

Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

This work addresses the challenge of hate speech detection in Internet m...
research
10/11/2022

Underspecification in Scene Description-to-Depiction Tasks

Questions regarding implicitness, ambiguity and underspecification are c...

Please sign up or login with your details

Forgot password? Click here to reset