Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation

09/13/2021
by   Michalis Korakakis, et al.
0

Despite strong performance in many sequence-to-sequence tasks, autoregressive models trained with maximum likelihood estimation suffer from exposure bias, i.e. a discrepancy between the ground-truth prefixes used during training and the model-generated prefixes used at inference time. Scheduled sampling is a simple and often empirically successful approach which addresses this issue by incorporating model-generated prefixes into the training process. However, it has been argued that it is an inconsistent training objective leading to models ignoring the prefixes altogether. In this paper, we conduct systematic experiments and find that it ameliorates exposure bias by increasing model reliance on the input sequence. We also observe that as a side-effect, it worsens performance when the model-generated prefix is correct, a form of catastrophic forgetting. We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality. Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU compared to standard scheduled sampling.

READ FULL TEXT
research
06/29/2023

The Importance of Robust Features in Mitigating Catastrophic Forgetting

Continual learning (CL) is an approach to address catastrophic forgettin...
research
04/04/2019

Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation

Despite some empirical success at correcting exposure bias in machine tr...
research
10/26/2020

Melody Harmonization Using Orderless NADE, Chord Balancing, and Blocked Gibbs Sampling

Coherence and interestingness are two criteria for evaluating the perfor...
research
04/09/2020

Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem

Training data for NLP tasks often exhibits gender bias in that fewer sen...
research
04/29/2020

Avoiding catastrophic forgetting in mitigating model biases in sentence-pair classification with elastic weight consolidation

The biases present in training datasets have been shown to be affecting ...
research
04/23/2017

Differentiable Scheduled Sampling for Credit Assignment

We demonstrate that a continuous relaxation of the argmax operation can ...
research
01/27/2023

Input Perturbation Reduces Exposure Bias in Diffusion Models

Denoising Diffusion Probabilistic Models have shown an impressive genera...

Please sign up or login with your details

Forgot password? Click here to reset