Dialect Text Normalization to Normative Standard Finnish

05/25/2020
by   Mika Hämäläinen, et al.
1

We compare different LSTMs and transformer models in terms of their effectiveness in normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common way of communication for people online in Finnish, such a normalization is a necessary step to improve the accuracy of the existing Finnish NLP tools that are tailored for norma-tive Finnish text. We work on a corpus consisting of dialectal data from 23 distinct Finnish dialect varieties. The best functioning BRNN approach lowers the initial word error rate of the corpus from 52.89 to 5.73.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2020

hinglishNorm – A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

We present hinglishNorm – a human annotated corpus of Hindi-English code...
research
04/08/2021

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

Morphological analysis (MA) and lexical normalization (LN) are both impo...
research
09/26/2020

Techniques to Improve Q A Accuracy with Transformer-based models on Large Complex Documents

This paper discusses the effectiveness of various text processing techni...
research
06/13/2018

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

In this paper, we apply different NMT models to the problem of historica...
research
06/16/2022

Text normalization for endangered languages: the case of Ligurian

Text normalization is a crucial technology for low-resource languages wh...
research
02/12/2021

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art t...
research
10/08/2020

Query-Key Normalization for Transformers

Low-resource language translation is a challenging but socially valuable...

Please sign up or login with your details

Forgot password? Click here to reset