Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

08/11/2022
by   Muhammad ElNokrashy, et al.
0

This paper proposes a simple yet effective method to improve direct (X-to-Y) translation for both cases: zero-shot and when direct data is available. We modify the input tokens at both the encoder and decoder to include signals for the source and target languages. We show a performance gain when training from scratch, or finetuning a pretrained model with the proposed setup. In the experiments, our method shows nearly 10.0 BLEU points gain on in-house datasets depending on the checkpoint selection criteria. In a WMT evaluation campaign, From-English performance improves by 4.17 and 2.87 BLEU points, in the zero-shot setting, and when direct data is available for training, respectively. While X-to-Y improves by 1.29 BLEU over the zero-shot baseline, and 0.44 over the many-to-many baseline. In the low-resource setting, we see a 1.5 1.7 point improvement when finetuning on X-to-Y domain data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2018

Improving Zero-Shot Translation of Low-Resource Languages

Recent work on multilingual neural machine translation reported competit...
research
05/25/2018

Zero-Shot Dual Machine Translation

Neural Machine Translation (NMT) systems rely on large amounts of parall...
research
04/01/2021

Low-Resource Neural Machine Translation for Southern African Languages

Low-resource African languages have not fully benefited from the progres...
research
03/10/2021

Self-Learning for Zero Shot Neural Machine Translation

Neural Machine Translation (NMT) approaches employing monolingual data a...
research
05/12/2022

Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

This paper describes the SLT-CDT-UoS group's submission to the first Spe...
research
05/16/2023

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

This paper studies the impact of layer normalization (LayerNorm) on zero...
research
07/13/2021

Deep Neural Networks are Surprisingly Reversible: A Baseline for Zero-Shot Inversion

Understanding the behavior and vulnerability of pre-trained deep neural ...

Please sign up or login with your details

Forgot password? Click here to reset