Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

06/08/2021
by   Rabeeh Karimi Mahabadi, et al.
0

State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29 parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2020

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters Less Data

Multi-Task Learning (MTL) has emerged as a promising approach for transf...
research
03/30/2022

Task Adaptive Parameter Sharing for Multi-Task Learning

Adapting pre-trained models with broad capabilities has become standard ...
research
07/12/2020

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Achieving state-of-the-art performance on natural language understanding...
research
03/16/2023

Efficient Computation Sharing for Multi-Task Visual Scene Understanding

Solving multiple visual tasks using individual models can be resource-in...
research
02/22/2021

Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

We propose UniT, a Unified Transformer model to simultaneously learn the...
research
02/07/2019

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Multi-task learning allows the sharing of useful information between mul...
research
05/06/2022

Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports

Pretrained Transformer based models finetuned on domain specific corpora...

Please sign up or login with your details

Forgot password? Click here to reset