Cost-Effective Training in Low-Resource Neural Machine Translation

01/14/2022
by   Sai Koneru, et al.
0

While Active Learning (AL) techniques are explored in Neural Machine Translation (NMT), only a few works focus on tackling low annotation budgets where a limited number of sentences can get translated. Such situations are especially challenging and can occur for endangered languages with few human annotators or having cost constraints to label large amounts of data. Although AL is shown to be helpful with large budgets, it is not enough to build high-quality translation systems in these low-resource conditions. In this work, we propose a cost-effective training procedure to increase the performance of NMT models utilizing a small number of annotated sentences and dictionary entries. Our method leverages monolingual data with self-supervised objectives and a small-scale, inexpensive dictionary for additional supervision to initialize the NMT model before applying AL. We show that improving the model using a combination of these knowledge sources is essential to exploit AL strategies and increase gains in low-resource conditions. We also present a novel AL strategy inspired by domain adaptation for NMT and show that it is effective for low budgets. We propose a new hybrid data-driven approach, which samples sentences that are diverse from the labelled data and also most similar to unlabelled data. Finally, we show that initializing the NMT model and further using our AL strategy can achieve gains of up to 13 BLEU compared to conventional AL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Revisiting Low-Resource Neural Machine Translation: A Case Study

It has been shown that the performance of neural machine translation (NM...
research
05/11/2020

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Over the last few years two promising research directions in low-resourc...
research
09/17/2019

Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation

Neural machine translation (NMT) systems require large amounts of high q...
research
06/09/2022

Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Neural Machine Translation (NMT) models have been effective on large bil...
research
11/30/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Large amounts of data has made neural machine translation (NMT) a big su...
research
06/08/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Self-supervised pre-training of text representations has been successful...
research
10/05/2021

Sicilian Translator: A Recipe for Low-Resource NMT

With 17,000 pairs of Sicilian-English translated sentences, Arba Sicula ...

Please sign up or login with your details

Forgot password? Click here to reset