Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

03/25/2023
by   Yuliang Cai, et al.
0

The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primarily consider learning unimodal vision-only or language-only tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks based on increasing the number of the learnable parameters dynamically and using knowledge distillation. The new additional parameters are used to specialize the network for each task. Our approach enables sharing information between the tasks while addressing the challenge of catastrophic forgetting. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead. Our model reaches state-of-the-art performance on challenging vision-and-language tasks.

READ FULL TEXT
research
03/30/2023

Practical self-supervised continual learning with continual fine-tuning

Self-supervised learning (SSL) has shown remarkable performance in compu...
research
06/28/2022

Continual Learning with Transformers for Image Classification

In many real-world scenarios, data to train machine learning models beco...
research
11/22/2021

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Deep network architectures struggle to continually learn new tasks witho...
research
06/18/2022

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Current state-of-the-art vision-and-language models are evaluated on tas...
research
06/02/2023

GateON: an unsupervised method for large scale continual learning

The objective of continual learning (CL) is to learn tasks sequentially ...
research
02/15/2023

À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme ...
research
11/15/2022

Exploring the Joint Use of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

Continual learning refers to a dynamical framework in which a model or a...

Please sign up or login with your details

Forgot password? Click here to reset