Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation

12/06/2022
by   fcq, et al.
0

A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2019

Multilingual Neural Machine Translation with Knowledge Distillation

Multilingual machine translation, which translates multiple languages wi...
research
10/17/2021

Reminding the Incremental Language Model via Data-Free Self-Distillation

Incremental language learning with pseudo-data can alleviate catastrophi...
research
02/06/2017

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network...
research
05/03/2022

OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Although all-in-one-model multilingual neural machine translation (MNMT)...
research
04/01/2021

Sampling and Filtering of Neural Machine Translation Distillation Data

In most of neural machine translation distillation or stealing scenarios...
research
06/23/2023

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Knowledge distillation is commonly used for compressing neural networks ...
research
04/19/2023

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

Knowledge distillation (KD) is a well-known method for compressing neura...

Please sign up or login with your details

Forgot password? Click here to reset