Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

07/07/2021
by   Jean-Baptiste Camps, et al.
28

Although abbreviations are fairly common in handwritten sources, particularly in medieval and modern Western manuscripts, previous research dealing with computational approaches to their expansion is scarce. Yet abbreviations present particular challenges to computational approaches such as handwritten text recognition and natural language processing tasks. Often, pre-processing ultimately aims to lead from a digitised image of the source to a normalised text, which includes expansion of the abbreviations. We explore different setups to obtain such a normalised text, either directly, by training HTR engines on normalised (i.e., expanded, disabbreviated) text, or by decomposing the process into discrete steps, each making use of specialist models for recognition, word segmentation and normalisation. The case studies considered here are drawn from the medieval Latin tradition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2023

Handwritten and Printed Text Segmentation: A Signature Case Study

While analyzing scanned documents, handwritten text can overlay printed ...
research
03/11/2022

Preliminary experiments on automatic gender recognition based on online capital letters

In this paper we present some experiments to automatically classify onli...
research
06/09/2022

Transformer based Urdu Handwritten Text Optical Character Reader

Extracting Handwritten text is one of the most important components of d...
research
05/30/2019

Deep Learning Approach for Receipt Recognition

Inspired by the recent successes of deep learning on Computer Vision and...
research
04/28/2019

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

Handling large corpuses of documents is of significant importance in man...
research
08/16/2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Handwritten Text Recognition (HTR) is an open problem at the intersectio...
research
04/09/2021

A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images

Query by String Keyword Spotting (KWS) is here considered as a key techn...

Please sign up or login with your details

Forgot password? Click here to reset