A Large-Scale Comparison of Historical Text Normalization Systems

04/03/2019
by   Marcel Bollmann, et al.
0

There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder--decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2018

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

In this paper, we apply different NMT models to the problem of historica...
research
04/07/2018

Evaluating historical text normalization systems: How well do they generalize?

We highlight several issues in the evaluation of historical text normali...
research
04/07/2022

tmVar 3.0: an improved variant concept recognition and normalization tool

Previous studies have shown that automated text-mining tools are becomin...
research
03/12/2019

Few-Shot and Zero-Shot Learning for Historical Text Normalization

Historical text normalization often relies on small training datasets. R...
research
07/04/2023

Transformed Protoform Reconstruction

Protoform reconstruction is the task of inferring what morphemes or word...
research
12/12/2021

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

Many historical map sheets are publicly available for studies that requi...
research
05/01/2021

Normalization of regressor excitation as a part of dynamic regressor extension and mixing procedure

The method of excitation normalization of the regressor, which is used i...

Please sign up or login with your details

Forgot password? Click here to reset