Improving Multilingual Translation by Representation and Gradient Regularization

09/10/2021
by   Yilin Yang, et al.
7

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations – commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2020

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Massively multilingual models for neural machine translation (NMT) are t...
research
06/30/2022

Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations

Multilingual Neural Machine Translation (MNMT) enables one system to tra...
research
11/10/2019

Translationese as a Language in "Multilingual" NMT

Machine translation has an undesirable propensity to produce "translatio...
research
10/10/2022

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions

We explore zero-shot adaptation, where a general-domain model has access...
research
06/18/2018

A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

Recently, neural machine translation (NMT) has been extended to multilin...
research
09/14/2021

Efficient Inference for Multilingual Neural Machine Translation

Multilingual NMT has become an attractive solution for MT deployment in ...
research
06/20/2023

GIO: Gradient Information Optimization for Training Dataset Selection

It is often advantageous to train models on a subset of the available tr...

Please sign up or login with your details

Forgot password? Click here to reset