SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

06/20/2020
by   Ekaterina Vylomova, et al.
0

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90 others were more challenging.

READ FULL TEXT

page 13

page 14

page 26

page 28

page 32

page 33

page 34

page 37

research
05/04/2023

DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning

In recent years, sentiment analysis has gained significant importance in...
research
10/13/2022

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Multilingual models have been widely used for cross-lingual transfer to ...
research
04/01/2021

Low-Resource Language Modelling of South African Languages

Language models are the foundation of current neural network-based model...
research
08/14/2021

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

We present the findings of the LoResMT 2021 shared task which focuses on...
research
05/06/2021

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

For many (minority) languages, the resources needed to train large model...
research
07/14/2022

Learning to translate by learning to communicate

We formulate and test a technique to use Emergent Communication (EC) wit...
research
10/21/2022

AfroLID: A Neural Language Identification Tool for African Languages

Language identification (LID) is a crucial precursor for NLP, especially...

Please sign up or login with your details

Forgot password? Click here to reset