Layerwise Optimization by Gradient Decomposition for Continual Learning

05/17/2021
by   Shixiang Tang, et al.
0

Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting". To achieve the consistencies between the old tasks and the new task, one effective solution is to modify the gradient for update. Previous methods enforce independent gradient constraints for different tasks, while we consider these gradients contain complex information, and propose to leverage inter-task information by gradient decomposition. In particular, the gradient of an old task is decomposed into a part shared by all old tasks and a part specific to that task. The gradient for update should be close to the gradient of the new task, consistent with the gradients shared by all old tasks, and orthogonal to the space spanned by the gradients specific to the old tasks. In this way, our approach encourages common knowledge consolidation without impairing the task-specific knowledge. Furthermore, the optimization is performed for the gradients of each layer separately rather than the concatenation of all gradients as in previous works. This effectively avoids the influence of the magnitude variation of the gradients in different layers. Extensive experiments validate the effectiveness of both gradient-decomposed optimization and layer-wise updates. Our proposed method achieves state-of-the-art results on various benchmarks of continual learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2019

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Continual learning aims to enable machine learning models to learn a gen...
research
09/25/2019

Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient

Current deep neural networks can achieve remarkable performance on a sin...
research
02/02/2023

Continual Learning with Scaled Gradient Projection

In neural networks, continual learning results in gradient interference ...
research
05/11/2021

TAG: Task-based Accumulated Gradients for Lifelong learning

When an agent encounters a continual stream of new tasks in the lifelong...
research
06/09/2021

Optimizing Reusable Knowledge for Continual Learning via Metalearning

When learning tasks over time, artificial neural networks suffer from a ...
research
05/20/2020

Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask learning is a methodology to boost generalization performance ...
research
05/31/2023

Learning Task-preferred Inference Routes for Gradient De-conflict in Multi-output DNNs

Multi-output deep neural networks(MONs) contain multiple task branches, ...

Please sign up or login with your details

Forgot password? Click here to reset