DeepAI AI Chat
Log In Sign Up

Lego-MT: Towards Detachable Models in Massively Multilingual Machine Translation

by   Fei Yuan, et al.

Traditional multilingual neural machine translation (MNMT) uses a single model to translate all directions. However, with the increasing scale of language pairs, simply using a single model for massive MNMT brings new challenges: parameter tension and large computations. In this paper, we revisit multi-way structures by assigning an individual branch for each language (group). Despite being a simple architecture, it is challenging to train de-centralized models due to the lack of constraints to align representations from all languages. We propose a localized training recipe to map different branches into a unified space, resulting in an efficient detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build the first large-scale open-source translation benchmark covering 7 language-centric data, each containing 445 language pairs. Experiments show that Lego-MT (1.2B) brings gains of more than 4 BLEU while outperforming M2M-100 (12B) (We will public all training data, models, and checkpoints)


Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...

University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages

The paper describes the University of Cape Town's submission to the cons...

Learning Language Specific Sub-network for Multilingual Machine Translation

Multilingual neural machine translation aims at learning a single transl...

Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning

Machine translation (MT) involving Indigenous languages, including those...

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

We investigate the following question for machine translation (MT): can ...

Findings of the Covid-19 MLIA Machine Translation Task

This work presents the results of the machine translation (MT) task from...

Scalable and Efficient MoE Training for Multitask Multilingual Models

The Mixture of Experts (MoE) models are an emerging class of sparsely ac...