AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

05/24/2022
by   Yaqing Wang, et al.
0

Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. Parameter-efficient techniques have been developed that tune small trainable components (e.g., adapters) injected in the large model while keeping most of the model weights frozen. The prevalent mechanism to increase adapter capacity is to increase the bottleneck dimension which increases the adapter parameters. In this work, we introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques. (i) We introduce multiple shared adapter components in each layer of the Transformer architecture. We leverage sparse learning via random routing to update the adapter parameters (encoder is kept frozen) resulting in the same amount of computational cost (FLOPs) as that of training a single adapter. (ii) We propose a simple merging mechanism to average the weights of multiple adapter components to collapse to a single adapter in each Transformer layer, thereby, keeping the overall parameters also the same but with significant performance improvement. We demonstrate these techniques to work well across multiple task settings including fully supervised and few-shot Natural Language Understanding tasks. By only tuning 0.23 model outperforms the full model fine-tuning performance and several competing methods.

READ FULL TEXT
research
10/31/2022

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Standard fine-tuning of large pre-trained language models (PLMs) for dow...
research
03/23/2022

Visual Prompt Tuning

The current modus operandi in adapting pre-trained models involves updat...
research
09/13/2023

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

The increasingly Large Language Models (LLMs) demonstrate stronger langu...
research
05/11/2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Few-shot in-context learning (ICL) enables pre-trained language models t...
research
11/16/2022

Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models

Conventional fine-tuning encounters increasing difficulties given the si...
research
10/14/2021

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Conventional fine-tuning of pre-trained language models tunes all model ...
research
05/23/2022

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

With the dramatically increased number of parameters in language models,...

Please sign up or login with your details

Forgot password? Click here to reset