MSdocTr-Lite: A Lite Transformer for Full Page Multi-script Handwriting Recognition

03/24/2023
by   Marwa Dhiaf, et al.
0

The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transfer-learning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2022

DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Unconstrained handwritten document recognition is a challenging computer...
research
09/11/2023

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from N...
research
04/15/2021

Rethinking Text Line Recognition Models

In this paper, we study the problem of text line recognition. Unlike mos...
research
06/12/2020

OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

Text recognition is a major computer vision task with a big set of assoc...
research
04/14/2023

Masked Pre-Training of Transformers for Histology Image Analysis

In digital pathology, whole slide images (WSIs) are widely used for appl...
research
01/14/2023

End-to-End Page-Level Assessment of Handwritten Text Recognition

The evaluation of Handwritten Text Recognition (HTR) systems has traditi...

Please sign up or login with your details

Forgot password? Click here to reset