SketchOGD: Memory-Efficient Continual Learning

05/25/2023
by   Benjamin Wright, et al.
0

When machine learning models are trained continually on a sequence of tasks, they are liable to forget what they learned on previous tasks – a phenomenon known as catastrophic forgetting. Proposed solutions to catastrophic forgetting tend to involve storing information about past tasks, meaning that memory usage is a chief consideration in determining their practicality. This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as orthogonal gradient descent (OGD). OGD utilizes prior model gradients to find weight updates that preserve performance on prior datapoints. However, since the memory cost of storing prior model gradients grows with the runtime of the algorithm, OGD is ill-suited to continual learning over arbitrarily long time horizons. To address this problem, this paper proposes SketchOGD. SketchOGD employs an online sketching algorithm to compress model gradients as they are encountered into a matrix of a fixed, user-determined size. In contrast to existing memory-efficient variants of OGD, SketchOGD runs online without the need for advance knowledge of the total number of tasks, is simple to implement, and is more amenable to analysis. We provide theoretical guarantees on the approximation error of the relevant sketches under a novel metric suited to the downstream task of OGD. Experimentally, we find that SketchOGD tends to outperform current state-of-the-art variants of OGD given a fixed memory budget.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2019

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Continual learning aims to enable machine learning models to learn a gen...
research
06/21/2020

Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

In continual learning settings, deep neural networks are prone to catast...
research
02/07/2023

Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning

Modern representation learning methods may fail to adapt quickly under n...
research
08/10/2022

ATLAS: Universal Function Approximator for Memory Retention

Artificial neural networks (ANNs), despite their universal function appr...
research
06/26/2020

Supermasks in Superposition

We present the Supermasks in Superposition (SupSup) model, capable of se...
research
08/02/2019

Toward Understanding Catastrophic Forgetting in Continual Learning

We study the relationship between catastrophic forgetting and properties...
research
06/20/2021

Memory Augmented Optimizers for Deep Learning

Popular approaches for minimizing loss in data-driven learning often inv...

Please sign up or login with your details

Forgot password? Click here to reset