Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

05/20/2021
by   Xiao Pan, et al.
0

Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose , a training method to obtain a single unified multilingual translation model. mCOLT is empowered by two techniques: (i) a contrastive learning scheme to close the gap among representations of different languages, and (ii) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mCOLT achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mCOLT achieves an improvement of average 10+ BLEU compared with the multilingual baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/17/2021

Back-translation for Large-Scale Multilingual Machine Translation

This paper illustrates our approach to the shared task on large-scale mu...
10/21/2020

Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...
02/07/2020

A Multilingual View of Unsupervised Machine Translation

We present a probabilistic framework for multilingual neural machine tra...
03/10/2022

A new approach to calculating BERTScore for automatic assessment of translation quality

The study of the applicability of the BERTScore metric was conducted to ...
08/06/2021

Facebook AI WMT21 News Translation Task Submission

We describe Facebook's multilingual model submission to the WMT2021 shar...
11/09/2019

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

Recent work has highlighted the advantage of jointly learning grounded s...
06/30/2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Neural network scaling has been critical for improving the model quality...