Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation

12/15/2022
by   Maha Elbayad, et al.
0

Sparsely gated Mixture of Experts (MoE) models have been shown to be a compute-efficient method to scale model capacity for multilingual machine translation. However, for low-resource tasks, MoE models severely over-fit. We show effective regularization strategies, namely dropout techniques for MoE layers in EOM and FOM, Conditional MoE Routing and Curriculum Learning methods that prevent over-fitting and improve the performance of MoE models on low-resource tasks without adversely affecting high-resource tasks. On a massively multilingual machine translation benchmark, our strategies result in about +1 chrF++ improvement in very low resource language pairs. We perform an extensive analysis of the learned MoE routing to better understand the impact of our regularization methods and how we can improve them.

READ FULL TEXT

page 16

page 17

research
03/11/2021

Learning Policies for Multilingual Training of Neural Machine Translation Systems

Low-resource Multilingual Neural Machine Translation (MNMT) is typically...
research
09/09/2021

Competence-based Curriculum Learning for Multilingual Machine Translation

Currently, multilingual machine translation is receiving more and more a...
research
12/24/2022

Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

In this paper, we study the use of deep Transformer translation model fo...
research
07/11/2022

No Language Left Behind: Scaling Human-Centered Machine Translation

Driven by the goal of eradicating language barriers on a global scale, m...
research
12/19/2022

Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model

Compared to conventional bilingual translation systems, massively multil...
research
09/13/2019

Adaptive Scheduling for Multi-Task Learning

To train neural machine translation models simultaneously on multiple ta...
research
04/15/2021

Demystify Optimization Challenges in Multilingual Transformers

Multilingual Transformer improves parameter efficiency and crosslingual ...

Please sign up or login with your details

Forgot password? Click here to reset