A Multitask Learning Approach for Diacritic Restoration

06/07/2020
by   Sawsan Alqahtani, et al.
0

In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings. Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word. This results in a more ambiguous text making computational processing on such text more difficult. Diacritic restoration is the task of restoring missing diacritics in the written text. Most state-of-the-art diacritic restoration models are built on character level information which helps generalize the model to unseen data, but presumably lose useful information at the word level. Thus, to compensate for this loss, we investigate the use of multi-task learning to jointly optimize diacritic restoration with related NLP problems namely word segmentation, part-of-speech tagging, and syntactic diacritization. We use Arabic as a case study since it has sufficient data resources for tasks that we consider in our joint modeling. Our joint models significantly outperform the baselines and are comparable to the state-of-the-art models that are more complex relying on morphological analyzers and/or a lot more data (e.g. dialectal data).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2019

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

Semitic languages can be highly ambiguous, having several interpretation...
research
10/28/2019

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

Morphological tagging is challenging for morphologically rich languages ...
research
11/21/2018

Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction

Morphological analysis is an important first step in downstream tasks li...
research
08/10/2018

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

We present LemmaTag, a featureless recurrent neural network architecture...
research
12/14/2019

Efficient Convolutional Neural Networks for Diacritic Restoration

Diacritic restoration has gained importance with the growing need for ma...
research
10/14/2019

Restoring ancient text using deep learning: a case study on Greek epigraphy

Ancient history relies on disciplines such as epigraphy, the study of an...
research
12/10/2019

Homograph Disambiguation Through Selective Diacritic Restoration

Lexical ambiguity, a challenging phenomenon in all natural languages, is...

Please sign up or login with your details

Forgot password? Click here to reset