Evaluating Multiway Multilingual NMT in the Turkic Languages

09/13/2021
by   Jamshidbek Mirzakhalov, et al.
11

Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from the Turkic language family, most of which being extremely under-explored. First, we adopt the TIL Corpus with a few key improvements to the training and the evaluation sets. Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations. We find that the MNMT model outperforms almost all bilingual baselines in the out-of-domain test sets and finetuning the model on a downstream task of a single pair also results in a huge performance boost in both low- and high-resource scenarios. Our attentive analysis of evaluation criteria for MT models in Turkic languages also points to the necessity for further research in this direction. We release the corpus splits, test sets as well as models to the public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2021

A Large-Scale Study of Machine Translation in the Turkic Languages

Recent advances in neural machine translation (NMT) have pushed the qual...
research
03/15/2021

MENYO-20k: A Multi-domain English-Yorùbá Corpus for Machine Translation and Domain Adaptation

Massively multilingual machine translation (MT) has shown impressive cap...
research
01/27/2020

PMIndia – A Collection of Parallel Corpora of Languages of India

Parallel text is required for building high-quality machine translation ...
research
12/15/2021

Lesan – Machine Translation for Low Resource Languages

Millions of people around the world can not access content on the Web be...
research
04/14/2020

Balancing Training for Multilingual Neural Machine Translation

When training multilingual machine translation (MT) models that can tran...
research
12/11/2020

Document-aligned Japanese-English Conversation Parallel Corpus

Sentence-level (SL) machine translation (MT) has reached acceptable qual...
research
04/30/2020

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

We propose the use of a sequence-to-sequence paraphraser for automatic m...

Please sign up or login with your details

Forgot password? Click here to reset