An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

08/17/2023
by   Yun Luo, et al.
0

Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information as it learns new information. As large language models (LLMs) have shown excellent performance, it is interesting to uncover whether CF exists in the continual fine-tuning of LLMs. In this study, we empirically evaluate the forgetting phenomenon in LLMs' knowledge, from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments demonstrate that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b. Furthermore, as the scale increases, the severity of forgetting also intensifies. Comparing the decoder-only model BLOOMZ with the encoder-decoder model mT0, BLOOMZ suffers less forgetting and maintains more knowledge. We also observe that LLMs can mitigate language bias (e.g. gender bias) during continual fine-tuning. Moreover, we find that ALPACA can maintain more knowledge and capacity compared with LLAMA during the continual fine-tuning, which implies that general instruction tuning can help mitigate the forgetting phenomenon of LLMs in the further fine-tuning process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension

The creation of large-scale open domain reading comprehension data sets ...
research
10/16/2019

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

In this work, we study how the large-scale pretrain-finetune framework c...
research
05/29/2023

Self Information Update for Large Language Models through Mitigating Exposure Bias

Current LLMs have demonstrated remarkable capabilities in addressing use...
research
06/20/2023

On Compositionality and Improved Training of NADO

NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllabl...
research
04/29/2020

Avoiding catastrophic forgetting in mitigating model biases in sentence-pair classification with elastic weight consolidation

The biases present in training datasets have been shown to be affecting ...
research
03/26/2021

Continual Speaker Adaptation for Text-to-Speech Synthesis

Training a multi-speaker Text-to-Speech (TTS) model from scratch is comp...
research
06/19/2023

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Fine-tuning has been proven to be a simple and effective technique to tr...

Please sign up or login with your details

Forgot password? Click here to reset