An Empirical Investigation of the Role of Pre-training in Lifelong Learning

12/16/2021
by   Sanket Vaibhav Mehta, et al.
5

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.

READ FULL TEXT

page 7

page 12

page 27

page 28

page 29

research
05/19/2022

Continual Pre-Training Mitigates Forgetting in Language and Vision

Pre-trained models are nowadays a fundamental component of machine learn...
research
06/26/2023

Learning to Modulate pre-trained Models in RL

Reinforcement Learning (RL) has been successful in various domains like ...
research
11/23/2018

Revisiting Pre-training: An Efficient Training Method for Image Classification

The training method of repetitively feeding all samples into a pre-defin...
research
04/18/2021

Lifelong Learning of Few-shot Learners across NLP Tasks

Recent advances in large pre-trained language models have greatly improv...
research
06/28/2022

Continual Learning with Transformers for Image Classification

In many real-world scenarios, data to train machine learning models beco...
research
05/22/2023

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Task arithmetic has recently emerged as a cost-effective and scalable ap...
research
06/29/2022

Forgetting Data from Pre-trained GANs

Large pre-trained generative models are known to occasionally provide sa...

Please sign up or login with your details

Forgot password? Click here to reset