DeepAI AI Chat
Log In Sign Up

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

by   Shuhao Gu, et al.

Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conduct experiments across different language pairs and domains to ensure the validity and reliability of our findings.


page 4

page 5

page 8


Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Domain Adaptation is widely used in practical applications of neural mac...

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

The development of deep learning techniques has allowed Neural Machine T...

Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

This paper considers continual learning of large-scale pretrained neural...

Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation

Neural networks tend to gradually forget the previously learned knowledg...

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

Adapter layers are lightweight, learnable units inserted between transfo...

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation

Continual Test-Time Adaptation (CTTA) aims to adapt the source model to ...

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

Many multi-domain neural machine translation (NMT) models achieve knowle...