A Large-Scale Study of Machine Translation in the Turkic Languages

09/09/2021
by   Jamshidbek Mirzakhalov, et al.
0

Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language family in order to realize the gains of NMT for Turkic languages under high-resource to extremely low-resource scenarios. In addition to presenting an extensive analysis that identifies the bottlenecks towards building competitive systems to ameliorate data scarcity, our study has several key contributions, including, i) a large parallel corpus covering 22 Turkic languages consisting of common public datasets in combination with new datasets of approximately 2 million parallel sentences, ii) bilingual baselines for 26 language pairs, iii) novel high-quality test sets in three different translation domains and iv) human evaluation scores. All models, scripts, and data will be released to the public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2023

Building a Parallel Corpus and Training Translation Models Between Luganda and English

Neural machine translation (NMT) has achieved great successes with large...
research
09/13/2021

Evaluating Multiway Multilingual NMT in the Turkic Languages

Despite the increasing number of large and comprehensive machine transla...
research
07/11/2019

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

We introduce our efforts towards building a universal neural machine tra...
research
09/28/2022

From Zero to Production: Baltic-Ukrainian Machine Translation Systems to Aid Refugees

In this paper, we examine the development and usage of six low-resource ...
research
05/25/2023

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

India has a rich linguistic landscape with languages from 4 major langua...
research
10/18/2016

SYSTRAN's Pure Neural Machine Translation Systems

Since the first online demonstration of Neural Machine Translation (NMT)...
research
04/09/2020

On optimal transformer depth for low-resource language translation

Transformers have shown great promise as an approach to Neural Machine T...

Please sign up or login with your details

Forgot password? Click here to reset