Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

03/18/2023
by   Qingru Zhang, et al.
0

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA .

READ FULL TEXT
research
08/23/2023

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...
research
06/13/2022

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning

Freezing the pre-trained backbone has become a standard paradigm to avoi...
research
06/17/2021

LoRA: Low-Rank Adaptation of Large Language Models

The dominant paradigm of natural language processing consists of large-s...
research
10/21/2021

Fast Model Editing at Scale

While large pre-trained models have enabled impressive results on a vari...
research
06/25/2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Large Transformer-based models have exhibited superior performance in va...
research
12/12/2022

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

Knowledge Distillation (KD) is a commonly used technique for improving t...
research
01/18/2021

Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

Due to the success of pre-trained models (PTMs), people usually fine-tun...

Please sign up or login with your details

Forgot password? Click here to reset