Levenshtein OCR

09/08/2022
by   Cheng Da, et al.
0

A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented. Inspired by Levenshtein Transformer in the area of NLP, the proposed method (named Levenshtein OCR, and LevOCR for short) explores an alternative way for automatically transcribing textual content from cropped natural images. Specifically, we cast the problem of scene text recognition as an iterative sequence refinement process. The initial prediction sequence produced by a pure vision model is encoded and fed into a cross-modal transformer to interact and fuse with the visual features, to progressively approximate the ground truth. The refinement process is accomplished via two basic character-level operations: deletion and insertion, which are learned with imitation learning and allow for parallel decoding, dynamic length change and good interpretability. The quantitative experiments clearly demonstrate that LevOCR achieves state-of-the-art performances on standard benchmarks and the qualitative analyses verify the effectiveness and advantage of the proposed LevOCR algorithm. Code will be released soon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Visual appearance is considered to be the most important cue to understa...
research
09/08/2022

Multi-Granularity Prediction for Scene Text Recognition

Scene text recognition (STR) has been an active research topic in comput...
research
07/25/2023

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

Due to the enormous technical challenges and wide range of applications,...
research
05/23/2023

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Pre-trained vision-language models are the de-facto foundation models fo...
research
11/09/2022

Pure Transformer with Integrated Experts for Scene Text Recognition

Scene text recognition (STR) involves the task of reading text in croppe...
research
05/08/2023

Scene Text Recognition with Image-Text Matching-guided Dictionary

Employing a dictionary can efficiently rectify the deviation between the...

Please sign up or login with your details

Forgot password? Click here to reset