A tailored Handwritten-Text-Recognition System for Medieval Latin

08/18/2023
by   Philipp Koch, et al.
0

The Bavarian Academy of Sciences and Humanities aims to digitize its Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the Handwritten Text Recognition (HTR) of the handwritten lemmas found on these record cards. In our work, we introduce an end-to-end pipeline, tailored to the medieval Latin dictionary, for locating, extracting, and transcribing the lemmas. We employ two state-of-the-art (SOTA) image segmentation models to prepare the initial data set for the HTR task. Furthermore, we experiment with different transformer-based models and conduct a set of experiments to explore the capabilities of different combinations of vision encoders with a GPT-2 decoder. Additionally, we also apply extensive data augmentation resulting in a highly competitive model. The best-performing setup achieved a Character Error Rate (CER) of 0.015, which is even superior to the commercial Google Cloud Vision model, and shows more stable performance.

READ FULL TEXT

page 1

page 8

page 12

research
12/14/2021

Handwritten text generation and strikethrough characters augmentation

We introduce two data augmentation techniques, which, used with a Resnet...
research
05/09/2021

End-to-End Optical Character Recognition for Bengali Handwritten Words

Optical character recognition (OCR) is a process of converting analogue ...
research
03/28/2023

Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

The paper discusses an approach to decipher large collections of handwri...
research
06/09/2022

Transformer based Urdu Handwritten Text Optical Character Reader

Extracting Handwritten text is one of the most important components of d...
research
07/10/2019

Fully Convolutional Networks for Handwriting Recognition

Handwritten text recognition is challenging because of the virtually inf...
research
09/11/2022

Lexicon and Attention based Handwritten Text Recognition System

The handwritten text recognition problem is widely studied by the resear...
research
03/05/2023

A Study of Augmentation Methods for Handwritten Stenography Recognition

One of the factors limiting the performance of handwritten text recognit...

Please sign up or login with your details

Forgot password? Click here to reset