Scaling Laws for Multilingual Neural Machine Translation

02/19/2023
by   Patrick Fernandes, et al.
0

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior. We find that changing the weightings of the individual language pairs in the training mixture only affect the multiplicative factor of the scaling law. In particular, we observe that multilingual models trained using different mixing rates all exhibit the same scaling exponent. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, the direction of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, significantly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2016

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism

We propose multi-way, multilingual neural machine translation. The propo...
research
09/13/2022

Revisiting Neural Scaling Laws in Language and Vision

The remarkable progress in deep learning in recent years is largely driv...
research
09/16/2021

Scaling Laws for Neural Machine Translation

We present an empirical study of scaling properties of encoder-decoder T...
research
09/22/2021

Scalable and Efficient MoE Training for Multitask Multilingual Models

The Mixture of Experts (MoE) models are an emerging class of sparsely ac...
research
12/05/2022

Impact of Domain-Adapted Multilingual Neural Machine Translation in the Medical Domain

Multilingual Neural Machine Translation (MNMT) models leverage many lang...
research
06/30/2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Neural network scaling has been critical for improving the model quality...
research
04/06/2023

On the Pareto Front of Multilingual Neural Machine Translation

In this work, we study how the generalization performance of a given dir...

Please sign up or login with your details

Forgot password? Click here to reset