Understanding Catastrophic Forgetting in Language Models via Implicit Inference

09/18/2023
by   Suhas Kotha, et al.
0

Fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback) is a crucial step in training language models to robustly carry out tasks of interest. However, we lack a systematic understanding of the effects of fine-tuning, particularly on tasks outside the narrow fine-tuning distribution. In a simplified scenario, we demonstrate that improving performance on tasks within the fine-tuning data distribution comes at the expense of suppressing model capabilities on other tasks. This degradation is especially pronounced for tasks "closest" to the fine-tuning distribution. We hypothesize that language models implicitly infer the task of the prompt corresponds, and the fine-tuning process predominantly skews this task inference towards tasks in the fine-tuning distribution. To test this hypothesis, we propose Conjugate Prompting to see if we can recover pretrained capabilities. Conjugate prompting artificially makes the task look farther from the fine-tuning distribution while requiring the same capability. We find that conjugate prompting systematically recovers some of the pretraining capabilities on our synthetic setup. We then apply conjugate prompting to real-world LLMs using the observation that fine-tuning distributions are typically heavily skewed towards English. We find that simply translating the prompts to different languages can cause the fine-tuned models to respond like their pretrained counterparts instead. This allows us to recover the in-context learning abilities lost via instruction tuning, and more concerningly, to recover harmful content generation suppressed by safety fine-tuning in chatbots like ChatGPT.

READ FULL TEXT
research
02/28/2023

In-Context Instruction Learning

Instruction learning of Large Language Models (LLMs) has enabled zero-sh...
research
10/23/2022

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

How to usefully encode compositional task structure has long been a core...
research
12/20/2022

Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

Are large language models (LLMs) like GPT-3 psychologically safe? In thi...
research
04/19/2023

A Latent Space Theory for Emergent Abilities in Large Language Models

Languages are not created randomly but rather to communicate information...
research
08/26/2023

Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content

In this paper, we tackle the emerging challenge of unintended harmful co...
research
10/19/2022

Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

Fine-tuning over large pretrained language models (PLMs) has established...
research
05/30/2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

A number of recent benchmarks seek to assess how well models handle natu...

Please sign up or login with your details

Forgot password? Click here to reset