Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

by   Xiao Pan, et al.

Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose , a training method to obtain a single unified multilingual translation model. mCOLT is empowered by two techniques: (i) a contrastive learning scheme to close the gap among representations of different languages, and (ii) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mCOLT achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mCOLT achieves an improvement of average 10+ BLEU compared with the multilingual baseline.


page 1

page 2

page 3

page 4


Back-translation for Large-Scale Multilingual Machine Translation

This paper illustrates our approach to the shared task on large-scale mu...

Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...

A Multilingual View of Unsupervised Machine Translation

We present a probabilistic framework for multilingual neural machine tra...

A new approach to calculating BERTScore for automatic assessment of translation quality

The study of the applicability of the BERTScore metric was conducted to ...

Facebook AI WMT21 News Translation Task Submission

We describe Facebook's multilingual model submission to the WMT2021 shar...

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

Recent work has highlighted the advantage of jointly learning grounded s...

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Neural network scaling has been critical for improving the model quality...