Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters Less Data

09/19/2020
by   Jonathan Pilault, et al.
7

Multi-Task Learning (MTL) has emerged as a promising approach for transferring learned knowledge across different tasks. However, multi-task learning must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Additionally, in Natural Language Processing (NLP), MTL alone has typically not reached the performance level possible through per-task fine-tuning of pretrained models. However, many fine-tuning approaches are both parameter inefficient, e.g. potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel transformer based architecture consisting of a new conditional attention mechanism as well as a set of task conditioned modules that facilitate weight sharing. Through this construction we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach we are able to surpass single-task fine-tuning methods while being parameter and data efficient. With our base model, we attain 2.2 full fine-tuned BERT large model on the GLUE benchmark, adding only 5.6 trained parameters per task (whereas naive fine-tuning potentially adds 100 the trained parameters per task) and needing only 64.6 that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2020

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Current approaches to solving classification tasks in NLP involve fine-t...
research
06/08/2021

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

State-of-the-art parameter-efficient fine-tuning methods rely on introdu...
research
10/25/2018

K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning

We introduce a novel method that enables parameter-efficient transfer an...
research
02/07/2019

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Multi-task learning allows the sharing of useful information between mul...
research
10/31/2022

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

We compare sequential fine-tuning with a model for multi-task learning i...
research
11/07/2022

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Parameter-efficient fine-tuning (PEFT) methods can adapt large language ...
research
03/15/2022

Hyperdecoders: Instance-specific decoders for multi-task NLP

We investigate input-conditioned hypernetworks for multi-tasking in NLP,...

Please sign up or login with your details

Forgot password? Click here to reset