Handwritten Text Recognition from Crowdsourced Annotations

06/19/2023
by   Solène Tarride, et al.
0

In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based data selection, where samples with low agreement are removed from the training set. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946. results The results show that computing a consensus transcription or training on multiple transcriptions are good alternatives. However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results. Our dataset is publicly available on Zenodo: https://zenodo.org/record/8041668.

READ FULL TEXT

page 1

page 2

research
08/26/2021

StackMix and Blot Augmentations for Handwritten Text Recognition

This paper proposes a handwritten text recognition(HTR) system that outp...
research
05/31/2023

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

Constructing a highly accurate handwritten OCR system requires large amo...
research
03/30/2010

Recognition of handwritten Roman Numerals using Tesseract open source OCR engine

The objective of the paper is to recognize handwritten samples of Roman ...
research
08/15/2023

Handwritten Stenography Recognition and the LION Dataset

Purpose: In this paper, we establish a baseline for handwritten stenogra...
research
06/12/2019

Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network

We present a new handwritten text segmentation method by training a conv...
research
08/16/2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Handwritten Text Recognition (HTR) is an open problem at the intersectio...
research
03/30/2010

Recognition of Handwritten Textual Annotations using Tesseract Open Source OCR Engine for information Just In Time (iJIT)

Objective of the current work is to develop an Optical Character Recogni...

Please sign up or login with your details

Forgot password? Click here to reset