Improving Multilingual Neural Machine Translation System for Indic Languages

09/27/2022
by   Sudhansu Bala Das, et al.
0

Machine Translation System (MTS) serves as an effective tool for communication by translating text or speech from one language to another language. The need of an efficient translation system becomes obvious in a large multilingual environment like India, where English and a set of Indian Languages (ILs) are officially used. In contrast with English, ILs are still entreated as low-resource languages due to unavailability of corpora. In order to address such asymmetric nature, multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. In this paper, we propose a MNMT system to address the issues related to low-resource language translation. Our model comprises of two MNMT systems i.e. for English-Indic (one-to-many) and the other for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have scanty amount of parallel corpora, not sufficient for training any machine translation model. We explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. Trials over a good amount of data reveal its superiority over the conventional models. In addition, the paper addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of backtranslation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics i.e BLEU (BiLingual Evaluation Understudy) score for a set of ILs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Extremely low-resource machine translation for closely related languages

An effective method to improve extremely low-resource neural machine tra...
research
06/25/2018

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

Resources for the non-English languages are scarce and this paper addres...
research
09/15/2022

Rethinking Round-trip Translation for Automatic Machine Translation Evaluation

A parallel corpus is generally required to automatically evaluate the tr...
research
01/06/2016

Incorporating Structural Alignment Biases into an Attentional Neural Translation Model

Neural encoder-decoder models of machine translation have achieved impre...
research
04/12/2021

Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Machine Translation

We translate a closed text that is known in advance into a severely low ...
research
09/09/2021

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Back-translation (BT) of target monolingual corpora is a widely used dat...
research
02/07/2023

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Learned metrics such as BLEURT have in recent years become widely employ...

Please sign up or login with your details

Forgot password? Click here to reset