A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal

by   Yaqian Zhang, et al.

Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of the loss landscape of past data with memory samples. This paper revisits the rehearsal dynamics in online settings. We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization, and examine the merits and limits of repeated rehearsal. Inspired by our analysis, a simple and intuitive baseline, Repeated Augmented Rehearsal (RAR), is designed to address the underfitting-overfitting dilemma of online rehearsal. Surprisingly, across four rather different OCL benchmarks, this simple baseline outperforms vanilla rehearsal by 9 rehearsal-based methods MIR, ASER, and SCR. We also demonstrate that RAR successfully achieves an accurate approximation of the loss landscape of past data and high-loss ridge aversion in its learning trajectory. Extensive ablation studies are conducted to study the interplay between repeated and augmented rehearsal and reinforcement learning (RL) is applied to dynamically adjust the hyperparameters of RAR to balance the stability-plasticity trade-off online.


Rehearsal revealed: The limits and merits of revisiting samples in continual learning

Learning from non-stationary data streams and overcoming catastrophic fo...

Graph-Based Continual Learning

Despite significant advances, continual learning models still suffer fro...

Dark Experience for General Continual Learning: a Strong, Simple Baseline

Neural networks struggle to learn continuously, as they forget the old k...

Continual Learning with Tiny Episodic Memories

Learning with less supervision is a major challenge in artificial intell...

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

The backpropagation networks are notably susceptible to catastrophic for...

Please sign up or login with your details

Forgot password? Click here to reset