A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

09/21/2022
by   Giuseppe De Gregorio, et al.
0

Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system's dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham's historical manuscript collections to obtain some really promising results in this direction.

READ FULL TEXT
research
03/10/2023

Marginalia and machine learning: Handwritten text recognition for Marginalia Collections

The pressing need for digitization of historical document collections ha...
research
05/26/2020

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

The advent of recurrent neural networks for handwriting recognition mark...
research
11/08/2018

Few-shot learning with attention-based sequence-to-sequence models

End-to-end approaches have recently become popular as a means of simplif...
research
05/04/2023

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...
research
03/06/2023

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

Keyword spotting (KWS) in historical documents is an important tool for ...
research
08/18/2020

Robust Handwriting Recognition with Limited and Noisy Data

Despite the advent of deep learning in computer vision, the general hand...
research
04/09/2021

A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images

Query by String Keyword Spotting (KWS) is here considered as a key techn...

Please sign up or login with your details

Forgot password? Click here to reset