Continual Learners are Incremental Model Generalizers

06/21/2023
by   Jaehong Yoon, et al.
0

Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. In both supervised and unsupervised CL, we find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

SGL-PT: A Strong Graph Learner with Graph Prompt Tuning

Recently, much exertion has been paid to design graph self-supervised me...
research
07/16/2023

Tangent Model Composition for Ensembling and Continual Fine-tuning

Tangent Model Composition (TMC) is a method to combine component models ...
research
03/17/2023

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

The "pre-training → downstream adaptation" presents both new opportuniti...
research
03/20/2022

Hierarchical Inductive Transfer for Continual Dialogue Learning

Pre-trained models have achieved excellent performance on the dialogue t...
research
05/06/2023

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Pre-trained language models (PLMs) have become a prevalent technique in ...
research
10/28/2022

Elastic Weight Consolidation Improves the Robustness of Self-Supervised Learning Methods under Transfer

Self-supervised representation learning (SSL) methods provide an effecti...
research
10/12/2022

Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

The utilization of broad datasets has proven to be crucial for generaliz...

Please sign up or login with your details

Forgot password? Click here to reset