Continual Learning with Transformers for Image Classification

06/28/2022
by   Beyza Ermis, et al.
0

In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limited computation sources that can be used. Moreover, training large neural networks, such as Transformers, from scratch is very costly and requires a vast amount of training data, which might not be available in the application domain of interest. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning, but this needs complex tuning to balance the growing number of parameters and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we validate in the computer vision domain a recent solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning using pre-trained Transformers and Adapters on text classification tasks. We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model or increasing the number of model parameters over the time. Besides it is significantly faster at inference time compared to the state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2022

Memory Efficient Continual Learning for Neural Text Classification

Learning text classifiers based on pre-trained language models has becom...
research
11/22/2021

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Deep network architectures struggle to continually learn new tasks witho...
research
07/26/2022

S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

State-of-the-art deep neural networks are still struggling to address th...
research
03/25/2023

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

The size and the computational load of fine-tuning large-scale pre-train...
research
05/08/2023

BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning

The ability of deep neural networks to continually learn and adapt to a ...
research
10/07/2022

Learnware: Small Models Do Big

There are complaints about current machine learning techniques such as t...
research
12/16/2021

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

The lifelong learning paradigm in machine learning is an attractive alte...

Please sign up or login with your details

Forgot password? Click here to reset