Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

05/23/2023
by   Haoran Xu, et al.
0

Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fully-connected layers. In this work, we introduce the Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices to approximate the full-rank matrix. Furthermore, we condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique to improve the efficiency of inference and model serialization. We show that our LMS method significantly outperforms previous LS methods and MoE methods with the same amount of extra parameters, e.g., 1.73 BLEU points over the Switch Transformer on many-to-many multilingual machine translation. Importantly, LMS is able to have comparable translation performance with much fewer parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders

State-of-the-art multilingual machine translation relies on a universal ...
research
02/27/2019

Multilingual Neural Machine Translation with Knowledge Distillation

Multilingual machine translation, which translates multiple languages wi...
research
05/03/2023

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

Mixture-of-experts (MoE) models that employ sparse activation have demon...
research
05/04/2023

Learning Language-Specific Layers for Multilingual Machine Translation

Multilingual Machine Translation promises to improve translation quality...
research
12/19/2022

Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model

Compared to conventional bilingual translation systems, massively multil...
research
12/27/2021

Parameter Differentiation based Multilingual Neural Machine Translation

Multilingual neural machine translation (MNMT) aims to translate multipl...
research
05/22/2022

Multilingual Machine Translation with Hyper-Adapters

Multilingual machine translation suffers from negative interference acro...

Please sign up or login with your details

Forgot password? Click here to reset