The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

08/16/2022
by   Silvia Cascianelli, et al.
22

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting – even of the same author over a wide time-span – and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. The dataset is available for download at <https://aimagelab.ing.unimore.it/go/lam>.

READ FULL TEXT

page 1

page 9

page 10

page 12

research
02/16/2009

Using SLP Neural Network to Persian Handwritten Digits Recognition

This paper has been withdrawn by the author ali pourmohammad....
research
05/04/2023

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...
research
01/19/2021

VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset

This paper publishes a natural and very complicated dataset of handwritt...
research
03/16/2018

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

When extracting information from handwritten documents, text transcripti...
research
06/06/2018

NumtaDB - Assembled Bengali Handwritten Digits

To benchmark Bengali digit recognition algorithms, a large publicly avai...
research
06/19/2023

Handwritten Text Recognition from Crowdsourced Annotations

In this paper, we explore different ways of training a model for handwri...
research
07/07/2021

Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

Although abbreviations are fairly common in handwritten sources, particu...

Please sign up or login with your details

Forgot password? Click here to reset