Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

11/24/2021
by   Shota Orihashi, et al.
0

This paper presents a novel training method for end-to-end scene text recognition. End-to-end scene text recognition offers high recognition accuracy, especially when using the encoder-decoder model based on Transformer. To train a highly accurate end-to-end model, we need to prepare a large image-to-text paired dataset for the target language. However, it is difficult to collect this data, especially for resource-poor languages. To overcome this difficulty, our proposed method utilizes well-prepared large datasets in resource-rich languages such as English, to train the resource-poor encoder-decoder model. Our key idea is to build a model in which the encoder reflects knowledge of multiple languages while the decoder specializes in knowledge of just the resource-poor language. To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language's dataset and the resource-rich language's dataset to learn language-invariant knowledge for scene text recognition. The proposed method also pre-trains the decoder by using the resource-poor language's dataset to make the decoder better suited to the resource-poor language. Experiments on Japanese scene text recognition using a small, publicly available dataset demonstrate the effectiveness of the proposed method.

READ FULL TEXT

page 4

page 6

research
06/29/2023

DiffusionSTR: Diffusion Model for Scene Text Recognition

This paper presents Diffusion Model for Scene Text Recognition (Diffusio...
research
08/30/2023

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Typical text recognition methods rely on an encoder-decoder structure, i...
research
01/23/2014

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

We propose a novel language-independent approach for improving machine t...
research
06/01/2022

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

In this paper, we present a model pretraining technique, named MaskOCR, ...
research
08/08/2022

Automatically constructing Wordnet synsets

Manually constructing a Wordnet is a difficult task, needing years of ex...
research
04/18/2015

A Knowledge-poor Pronoun Resolution System for Turkish

A pronoun resolution system which requires limited syntactic knowledge t...
research
05/05/2023

Online Gesture Recognition using Transformer and Natural Language Processing

The Transformer architecture is shown to provide a powerful machine tran...

Please sign up or login with your details

Forgot password? Click here to reset