Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

10/28/2019
by   Nasser Zalmout, et al.
0

Morphological tagging is challenging for morphologically rich languages due to the large target space and the need for more training data to minimize model sparsity. Dialectal variants of morphologically rich languages suffer more as they tend to be more noisy and have less resources. In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging. We use multitask learning for joint morphological modeling for the features within two dialects, and as a knowledge-transfer scheme for cross-dialectal modeling. We use adversarial training to learn dialect invariant features that can help the knowledge-transfer scheme from the high to low-resource variants. We work with two dialectal variants: Modern Standard Arabic (high-resource "dialect") and Egyptian Arabic (low-resource dialect) as a case study. Our models achieve state-of-the-art results for both. Furthermore, adversarial training provides more significant improvement when using smaller training datasets in particular.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2017

Cross-lingual, Character-Level Neural Morphological Tagging

Even for common NLP tasks, sufficient supervision is not available in ma...
research
10/05/2019

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

Semitic languages can be highly ambiguous, having several interpretation...
research
06/07/2020

A Multitask Learning Approach for Diacritic Restoration

In many languages like Arabic, diacritics are used to specify pronunciat...
research
11/14/2017

Robust Multilingual Part-of-Speech Tagging via Adversarial Training

Adversarial training (AT) is a powerful regularization method for neural...
research
11/30/2022

Camelira: An Arabic Multi-Dialect Morphological Disambiguator

We present Camelira, a web-based Arabic multi-dialect morphological disa...
research
03/06/2016

Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: the eXtended Revised AraMorph (XRAM)

An extended, revised form of Tim Buckwalter's Arabic lexical and morphol...
research
11/25/2020

De-STT: De-entaglement of unwanted Nuisances and Biases in Speech to Text System using Adversarial Forgetting

Training a robust Speech to Text (STT) system requires tens of thousands...

Please sign up or login with your details

Forgot password? Click here to reset