Looking and Listening: Audio Guided Text Recognition

06/06/2023
by   Wenwen Yu, et al.
0

Text recognition in the wild is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest vision and language processing are effective for scene text recognition. Yet, solving edit errors such as add, delete, or replace is still the main challenge for existing approaches. In fact, the content of the text and its audio are naturally corresponding to each other, i.e., a single character error may result in a clear different pronunciation. In this paper, we propose the AudioOCR, a simple yet effective probabilistic audio decoder for mel spectrogram sequence prediction to guide the scene text recognition, which only participates in the training phase and brings no extra cost during the inference stage. The underlying principle of AudioOCR can be easily applied to the existing approaches. Experiments using 7 previous scene text recognition methods on 12 existing regular, irregular, and occluded benchmarks demonstrate our proposed method can bring consistent improvement. More importantly, through our experimentation, we show that AudioOCR possesses a generalizability that extends to more challenging scenarios, including recognizing non-English text, out-of-vocabulary words, and text with various accents. Code will be available at https://github.com/wenwenyu/AudioOCR.

READ FULL TEXT
research
12/10/2019

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition

Deep learning based methods have achieved surprising progress in Scene T...
research
01/04/2023

SPTS v2: Single-Point Scene Text Spotting

End-to-end scene text spotting has made significant progress due to its ...
research
08/29/2019

Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Recently, scene text recognition methods based on deep learning have spr...
research
02/04/2020

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

Connectionist Temporal Classification (CTC) and attention mechanism are ...
research
02/22/2021

CSTR: A Classification Perspective on Scene Text Recognition

The prevalent perspectives of scene text recognition are from sequence t...
research
07/21/2015

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Image-based sequence recognition has been a long-standing research topic...
research
11/10/2021

Improving Structured Text Recognition with Regular Expression Biasing

We study the problem of recognizing structured text, i.e. text that foll...

Please sign up or login with your details

Forgot password? Click here to reset