Evaluating historical text normalization systems: How well do they generalize?

04/07/2018
by   Alexander Robertson, et al.
0

We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice---i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a naïve baseline system. We show that the neural models generalize well to unseen words in tests on five languages; nevertheless, they provide no clear benefit over the naïve baseline for downstream POS tagging of an English historical collection. We conclude that future work should include more rigorous evaluation, including both intrinsic and extrinsic measures where possible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

A Large-Scale Comparison of Historical Text Normalization Systems

There is no consensus on the state-of-the-art approach to historical tex...
research
06/13/2018

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

In this paper, we apply different NMT models to the problem of historica...
research
06/05/2023

Jambu: A historical linguistic database for South Asian languages

We introduce Jambu, a cognate database of South Asian languages which un...
research
05/02/2023

When Newer is Not Better: Does Deep Learning Really Benefit Recommendation From Implicit Feedback?

In recent years, neural models have been repeatedly touted to exhibit st...
research
06/23/2018

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...
research
06/27/2016

Evaluating Informal-Domain Word Representations With UrbanDictionary

Existing corpora for intrinsic evaluation are not targeted towards tasks...
research
08/30/2017

Paradigm Completion for Derivational Morphology

The generation of complex derived word forms has been an overlooked prob...

Please sign up or login with your details

Forgot password? Click here to reset