Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

06/08/2021
by   Rabeeh Karimi Mahabadi, et al.
0

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared “slow” weights and “fast” rank-one matrices defined per Compacter layer. By only training 0.047 model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms fine-tuning in low-resource settings. Our code is publicly available in https://github.com/rabeehk/compacter/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Standard fine-tuning of large pre-trained language models (PLMs) for dow...
research
08/07/2023

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

The low-rank adaptation (LoRA) method can largely reduce the amount of t...
research
06/10/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

While large-scale pretrained language models have obtained impressive re...
research
03/22/2022

Task-guided Disentangled Tuning for Pretrained Language Models

Pretrained language models (PLMs) trained on large-scale unlabeled corpu...
research
05/23/2022

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

With the dramatically increased number of parameters in language models,...
research
03/02/2023

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

Recently, finetuning pretrained vision-language models (VLMs) has become...
research
05/03/2022

Adaptable Adapters

State-of-the-art pretrained NLP models contain a hundred million to tril...

Please sign up or login with your details

Forgot password? Click here to reset