Log In Sign Up

More Parameters? No Thanks!

by   Zeeshan Khan, et al.

This work studies the long-standing problems of model capacity and negative interference in multilingual neural machine translation MNMT. We use network pruning techniques and observe that pruning 50-70 trained MNMT model results only in a 0.29-1.98 drop in the BLEU score. Suggesting that there exist large redundancies even in MNMT models. These observations motivate us to use the redundant parameters and counter the interference problem efficiently. We propose a novel adaptation strategy, where we iteratively prune and retrain the redundant parameters of an MNMT to improve bilingual representations while retaining the multilinguality. Negative interference severely affects high resource languages, and our method alleviates it without any additional adapter modules. Hence, we call it parameter-free adaptation strategy, paving way for the efficient adaptation of MNMT. We demonstrate the effectiveness of our method on a 9 language MNMT trained on TED talks, and report an average improvement of +1.36 BLEU on high resource pairs. Code will be released here.


page 1

page 2

page 3

page 4


Learning Language Specific Sub-network for Multilingual Machine Translation

Multilingual neural machine translation aims at learning a single transl...

Rapid Adaptation of Neural Machine Translation to New Languages

This paper examines the problem of adapting neural machine translation s...

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Modern multilingual models are trained on concatenated text from multipl...

Adapting Multilingual Neural Machine Translation to Unseen Languages

Multilingual Neural Machine Translation (MNMT) for low-resource language...

The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

Recent model pruning methods have demonstrated the ability to remove red...

Causes and Cures for Interference in Multilingual Translation

Multilingual machine translation models can benefit from synergy between...

Does Interference Exist When Training a Once-For-All Network?

The Once-For-All (OFA) method offers an excellent pathway to deploy a tr...