Low-Resource Machine Translation using Interlinear Glosses

11/07/2019
by   Zhong Zhou, et al.
0

Neural Machine Translation (NMT) does not handle low-resource translation well because NMT is data-hungry and low-resource languages, by their nature, have limited parallel data. Many low-resource languages are morphologically rich, which complicates matters further by increasing data sparsity. However, a good linguist is capable of building a morphological analyzer in far fewer hours than it would take to collect and translate the amount of parallel data needed for conventional NMT. We combine the benefits of both NMT and linguistic information in our work. We use morphological analyzer to automatically generate interlinear glosses with dictionary or parallel data, and translate the source text to interlinear gloss as an interlingua representation, and finally translate into the target text using NMT trained on the ODIN dataset that includes a large collection of interlinear glosses and their corresponding target translations. Our result for translating from the interlinear gloss to the target text using the entire ODIN dataset achieves a BLEU score of 35.07. And our qualitative results show positive findings in a low-resource scenario of Turkish-English translation using 865 lines of training data. Our translation system yield better results than training NMT directly from the source language to the target language in a constrained-data setting, and is helpful to produce translation with sufficiently good content and fluency when data is scarce.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2021

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

The scarcity of parallel data is a major obstacle for training high-qual...
research
05/28/2019

Revisiting Low-Resource Neural Machine Translation: A Case Study

It has been shown that the performance of neural machine translation (NM...
research
08/16/2021

Active Learning for Massively Parallel Translation of Constrained Text into Low Resource Languages

We translate a closed text that is known in advance and available in man...
research
04/12/2021

Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Machine Translation

We translate a closed text that is known in advance into a severely low ...
research
11/30/2021

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

We conduct an empirical study of neural machine translation (NMT) for tr...
research
03/24/2021

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

We conduct an empirical study of unsupervised neural machine translation...
research
06/29/2021

Neural Machine Translation for Low-Resource Languages: A Survey

Neural Machine Translation (NMT) has seen a tremendous spurt of growth i...

Please sign up or login with your details

Forgot password? Click here to reset