Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

11/07/2022
by   Lucas Caccia, et al.
0

Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation (Poly-mu) instead of learning a routing function. Moreover, we introduce more expressive variants where finer-grained task-adapter allocation is learned through a multi-head routing function (Poly-S). We test these variants on three separate benchmarks for multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we find that instruction tuning, where models are fully fine-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2020

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters Less Data

Multi-Task Learning (MTL) has emerged as a promising approach for transf...
research
05/12/2023

A Comprehensive Analysis of Adapter Efficiency

Adapters have been positioned as a parameter-efficient fine-tuning (PEFT...
research
03/15/2022

Hyperdecoders: Instance-specific decoders for multi-task NLP

We investigate input-conditioned hypernetworks for multi-tasking in NLP,...
research
10/10/2019

Gumbel-Matrix Routing for Flexible Multi-task Learning

This paper proposes a novel per-task routing method for multi-task appli...
research
09/13/2023

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

The increasingly Large Language Models (LLMs) demonstrate stronger langu...
research
05/24/2022

Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing

This work introduces ATTEMPT (Attentional Mixture of Prompt Tuning), a n...
research
03/31/2023

A Closer Look at Parameter-Efficient Tuning in Diffusion Models

Large-scale diffusion models like Stable Diffusion are powerful and find...

Please sign up or login with your details

Forgot password? Click here to reset