Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

10/25/2016
by   Marcel Bollmann, et al.
0

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2019

Few-Shot and Zero-Shot Learning for Historical Text Normalization

Historical text normalization often relies on small training datasets. R...
research
06/13/2018

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

In this paper, we apply different NMT models to the problem of historica...
research
04/13/2021

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Understanding voluminous historical records provides clues on the past i...
research
04/13/2021

UPB at SemEval-2021 Task 7: Adversarial Multi-Task Learning for Detecting and Rating Humor and Offense

Detecting humor is a challenging task since words might share multiple v...
research
02/21/2017

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

In this paper we describe a dynamic normalization process applied to soc...
research
03/10/2016

Part-of-Speech Tagging for Historical English

As more historical texts are digitized, there is interest in applying na...
research
11/15/2019

Bootstrapping NLU Models with Multi-task Learning

Bootstrapping natural language understanding (NLU) systems with minimal ...

Please sign up or login with your details

Forgot password? Click here to reset