Language Model Prior for Low-Resource Neural Machine Translation

by   Christos Baziotis, et al.

The scarcity of large parallel corpora is an important obstacle for neural machine translation. A common solution is to exploit the knowledge of language models (LM) trained on abundant monolingual data. In this work, we propose a novel approach to incorporate a LM as prior in a neural translation model (TM). Specifically, we add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior, while avoiding wrong predictions when the TM "disagrees" with the LM. This objective relates to knowledge distillation, where the LM can be viewed as teaching the TM about the target language. The proposed approach does not compromise decoding speed, because the LM is used only at training time, unlike previous work that requires it during inference. We present an analysis of the effects that different methods have on the distributions of the TM. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.



There are no comments yet.


page 7


A Survey of Methods to Leverage Monolingual Data in Low-resource Neural Machine Translation

Neural machine translation has become the state-of-the-art for language ...

Language Graph Distillation for Low-Resource Machine Translation

Neural machine translation on low-resource language is challenging due t...

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

Sequence-to-sequence (S2S) pre-training using large monolingual data is ...

Monolingual and Parallel Corpora for Kangri Low Resource Language

In this paper we present the dataset of Himachali low resource endangere...

Reciprocal Supervised Learning Improves Neural Machine Translation

Despite the recent success on image classification, self-training has on...

Moment Matching Training for Neural Machine Translation: A Preliminary Study

In previous works, neural sequence models have been shown to improve sig...

Exploring Monolingual Data for Neural Machine Translation with Knowledge Distillation

We explore two types of monolingual data that can be included in knowled...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.