Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

10/16/2019
by   Tianxing He, et al.
15

In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review". We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2023

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine lear...
research
09/11/2021

Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems

In recent years, several high-performance conversational systems have be...
research
06/26/2023

Learning to Modulate pre-trained Models in RL

Reinforcement Learning (RL) has been successful in various domains like ...
research
07/28/2022

Persona-Knowledge Dialogue Multi-Context Retrieval and Enhanced Decoding Methods

Persona and Knowledge dual context open-domain chat is a novel dialogue ...
research
05/12/2023

Prompt Learning to Mitigate Catastrophic Forgetting in Cross-lingual Transfer for Open-domain Dialogue Generation

Dialogue systems for non-English languages have long been under-explored...
research
10/10/2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

Recent language generative models are mostly trained on large-scale data...
research
07/05/2023

Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

Recently, the release of INSTRUCTEVAL has provided valuable insights int...

Please sign up or login with your details

Forgot password? Click here to reset