HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

01/22/2021
by   Christian M. Dahl, et al.
0

Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 1.1 million images of handwritten word-groups. The database is a collection of personal names, containing more than 105 thousand unique names with a total of more than 3.3 million examples. In addition, we present benchmark results for deep learning models that automatically can transcribe the personal names from the scanned documents. Focusing mainly on personal names, due to its vital role in linking, we hope to foster more sophisticated, accurate, and robust models for handwritten text recognition through making more challenging large-scale databases publicly available. This paper describes the data source, the collection process, and the image-processing procedures and methods that are involved in extracting the handwritten personal names and handwritten text in general from the forms.

READ FULL TEXT
research
10/02/2022

DARE: A large-scale handwritten date recognition system

Handwritten text recognition for historical documents is an important ta...
research
12/29/2017

Personal Names in Modern Turkey

We analyzed the most common 5000 male and 5000 female Turkish names base...
research
11/13/2018

Personal Names Popularity Estimation and its Application to Record Linkage

This study deals with a fairly simply formulated problem -- how to estim...
research
02/09/2021

Classification of Handwritten Names of Cities and Handwritten Text Recognition using Various Deep Learning Models

This article discusses the problem of handwriting recognition in Kazakh ...
research
04/27/2023

Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records

This paper presents a complete workflow designed for extracting informat...
research
06/29/2023

The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

Scanned historical maps in libraries and archives are valuable repositor...
research
05/09/2022

Behind the Mask: Demographic bias in name detection for PII masking

Many datasets contain personally identifiable information, or PII, which...

Please sign up or login with your details

Forgot password? Click here to reset