SVTR: Scene Text Recognition with a Single Visual Model

04/30/2022
by   Yongkun Du, et al.
0

Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription. This hybrid architecture, although accurate, is complex and less efficient. In this study, we propose a Single Visual model for Scene Text recognition within the patch-wise image tokenization framework, which dispenses with the sequential modeling entirely. The method, termed SVTR, firstly decomposes an image text into small patches named character components. Afterward, hierarchical stages are recurrently carried out by component-level mixing, merging and/or combining. Global and local mixing blocks are devised to perceive the inter-character and intra-character patterns, leading to a multi-grained character component perception. Thus, characters are recognized by a simple linear prediction. Experimental results on both English and Chinese scene text recognition tasks demonstrate the effectiveness of SVTR. SVTR-L (Large) achieves highly competitive accuracy in English and outperforms existing methods by a large margin in Chinese, while running faster. In addition, SVTR-T (Tiny) is an effective and much smaller model, which shows appealing speed at inference. The code is publicly available at https://github.com/PaddlePaddle/PaddleOCR.

READ FULL TEXT

page 2

page 6

research
07/23/2023

Context Perception Parallel Decoder for Scene Text Recognition

Scene text recognition (STR) methods have struggled to attain high accur...
research
09/03/2023

Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

Scene text recognition has been studied for decades due to its broad app...
research
08/04/2023

Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition

Optical Character Recognition (OCR) enables automatic text extraction fr...
research
11/19/2019

KISS: Keeping It Simple for Scene Text Recognition

Over the past few years, several new methods for scene text recognition ...
research
08/23/2019

A BLSTM Network for Printed Bengali OCR System with High Accuracy

This paper presents a printed Bengali and English text OCR system develo...
research
12/20/2022

Unleashing the Power of Visual Prompting At the Pixel Level

This paper presents a simple and effective visual prompting method for a...
research
12/15/2020

Research on All-content Text Recognition Method for Financial Ticket Image

With the development of the economy, the number of financial tickets inc...

Please sign up or login with your details

Forgot password? Click here to reset