Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

03/02/2023
by   Mingxu Tao, et al.
0

Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of tasks. To alleviate this problem, recent works enhance existing models by sparse experience replay and local adaption, which yield satisfactory performance. However, in this paper we find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. To verify the ability of BERT to maintain old knowledge, we adopt and re-finetune single-layer probe networks with the parameters of BERT fixed. We investigate the models on two types of NLP tasks, text classification and extractive question answering. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay. We further introduce a series of novel methods to interpret the mechanism of forgetting and how memory rehearsal plays a significant role in task incremental learning, which bridges the gap between our new discovery and previous studies about catastrophic forgetting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Prompt Tuning for Discriminative Pre-trained Language Models

Recent works have shown promising results of prompt tuning in stimulatin...
research
08/30/2022

BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition

We present BioSLAM, a lifelong SLAM framework for learning various new a...
research
10/12/2021

LaoPLM: Pre-trained Language Models for Lao

Trained on the large corpus, pre-trained language models (PLMs) can capt...
research
10/03/2022

How Relevant is Selective Memory Population in Lifelong Language Learning?

Lifelong language learning seeks to have models continuously learn multi...
research
10/06/2020

Efficient Meta Lifelong-Learning with Limited Memory

Current natural language processing models work well on a single task, y...
research
06/03/2019

Episodic Memory in Lifelong Language Learning

We introduce a lifelong language learning setup where a model needs to l...
research
06/06/2021

Transient Chaos in BERT

Language is an outcome of our complex and dynamic human-interactions and...

Please sign up or login with your details

Forgot password? Click here to reset