Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

11/02/2020
by   Shuhao Gu, et al.
0

Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conduct experiments across different language pairs and domains to ensure the validity and reliability of our findings.

READ FULL TEXT

page 4

page 5

page 8

research
03/25/2021

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Domain Adaptation is widely used in practical applications of neural mac...
research
04/14/2021

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

The development of deep learning techniques has allowed Neural Machine T...
research
11/03/2022

Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

This paper considers continual learning of large-scale pretrained neural...
research
03/08/2022

Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation

Neural networks tend to gradually forget the previously learned knowledg...
research
10/18/2021

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

Adapter layers are lightweight, learnable units inserted between transfo...
research
12/08/2022

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation

Continual Test-Time Adaptation (CTTA) aims to adapt the source model to ...
research
11/07/2019

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

Many multi-domain neural machine translation (NMT) models achieve knowle...

Please sign up or login with your details

Forgot password? Click here to reset