Natural continual learning: success is a journey, not (just) a destination

by   Ta-Chu Kao, et al.

Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance. In contrast, artificial agents are prone to 'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods that encourage parameters to stay close to those used for previous tasks. This can be done by (i) using specific parameter regularizers that map out suitable destinations in parameter space, or (ii) guiding the optimization journey by projecting gradients into subspaces that do not interfere with previous tasks. However, parameter regularization has been shown to be relatively ineffective in recurrent neural networks (RNNs), a setting relevant to the study of neural dynamics supporting biological continual learning. Similarly, projection based methods can reach capacity and fail to learn any further as the number of tasks increases. To address these limitations, we propose Natural Continual Learning (NCL), a new method that unifies weight regularization and projected gradient descent. NCL uses Bayesian weight regularization to encourage good performance on all tasks at convergence and combines this with gradient projections designed to prevent catastrophic forgetting during optimization. NCL formalizes gradient projection as a trust region algorithm based on the Fisher information metric, and achieves scalability via a novel Kronecker-factored approximation strategy. Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs. The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.



There are no comments yet.


page 21


Weight Friction: A Simple Method to Overcome Catastrophic Forgetting and Enable Continual Learning

In recent years, deep neural networks have found success in replicating ...

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Continual learning aims to enable machine learning models to learn a gen...

Orthogonal Gradient Descent for Continual Learning

Neural networks are achieving state of the art and sometimes super-human...

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Continual/lifelong learning from a non-stationary input data stream is a...

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

The backpropagation networks are notably susceptible to catastrophic for...

Continual Learning in Recurrent Neural Networks with Hypernetworks

The last decade has seen a surge of interest in continual learning (CL),...

Continual learning with direction-constrained optimization

This paper studies a new design of the optimization algorithm for traini...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.