Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

10/09/2021
by   Danruo Deng, et al.
0

The backpropagation networks are notably susceptible to catastrophic forgetting, where networks tend to forget previously learned skills upon learning new ones. To address such the 'sensitivity-stability' dilemma, most previous efforts have been contributed to minimizing the empirical risk with different parameter regularization terms and episodic memory, but rarely exploring the usages of the weight loss landscape. In this paper, we investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario, based on which, we propose a novel method, Flattening Sharpness for Dynamic Gradient Projection Memory (FS-DGPM). In particular, we introduce a soft weight to represent the importance of each basis representing past tasks in GPM, which can be adaptively learned during the learning process, so that less important bases can be dynamically released to improve the sensitivity of new skill learning. We further introduce Flattening Sharpness (FS) to reduce the generalization gap by explicitly regulating the flatness of the weight loss landscape of all seen tasks. As demonstrated empirically, our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.

READ FULL TEXT
research
11/16/2020

Gradient Episodic Memory with a Soft Constraint for Continual Learning

Catastrophic forgetting in continual learning is a common destructive ph...
research
03/26/2022

Continual learning of quantum state classification with gradient episodic memory

Continual learning is one of the many areas of machine learning research...
research
04/29/2020

Continual Deep Learning by Functional Regularisation of Memorable Past

Continually learning new skills is important for intelligent systems, ye...
research
06/15/2021

Natural continual learning: success is a journey, not (just) a destination

Biological agents are known to learn many different tasks over the cours...
research
02/02/2023

Continual Learning with Scaled Gradient Projection

In neural networks, continual learning results in gradient interference ...
research
09/28/2022

A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal

Online continual learning (OCL) aims to train neural networks incrementa...
research
03/17/2021

Gradient Projection Memory for Continual Learning

The ability to learn continually without forgetting the past tasks is a ...

Please sign up or login with your details

Forgot password? Click here to reset