Continual Pre-Training Mitigates Forgetting in Language and Vision

05/19/2022
by   Andrea Cossu, et al.
18

Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We show that continually pre-trained models are robust against catastrophic forgetting and we provide strong empirical evidence supporting the fact that self-supervised pre-training is more effective in retaining previous knowledge than supervised protocols. Code is provided at https://github.com/AndreaCossu/continual-pretraining-nlp-vision .

READ FULL TEXT

page 6

page 9

page 18

page 19

research
03/25/2021

Self-Supervised Training Enhances Online Continual Learning

In continual learning, a system must incrementally learn from a non-stat...
research
04/30/2022

Foundational Models for Continual Learning: An Empirical Study of Latent Replay

Rapid development of large-scale pre-training has resulted in foundation...
research
05/06/2023

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Pre-trained language models (PLMs) have become a prevalent technique in ...
research
12/16/2021

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

The lifelong learning paradigm in machine learning is an attractive alte...
research
05/10/2023

Investigating Forgetting in Pre-Trained Representations Through Continual Learning

Representation forgetting refers to the drift of contextualized represen...
research
12/05/2021

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Continual learning (CL) learns a sequence of tasks incrementally with th...
research
07/13/2022

Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach

Continual learning (CL) over non-stationary data streams remains one of ...

Please sign up or login with your details

Forgot password? Click here to reset