Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

05/28/2023
by   Mingyang Zhang, et al.
0

Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exceptional performance across various tasks. Although parameter-efficient fine-tuning (PEFT) has emerged to cheaply fine-tune these large models on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a solution for model compression by removing redundant parameters, but most existing methods rely on computing parameter gradients. However, obtaining the gradients is computationally prohibitive for LPMs, which necessitates the exploration of alternative approaches. To this end, we propose a unified framework for efficient fine-tuning and deployment of LPMs, termed LoRAPrune. We first design a PEFT-aware pruning criterion, which utilizes the values and gradients of Low-Rank Adaption (LoRA), rather than the gradients of pre-trained parameters for importance estimation. We then propose an iterative pruning procedure to remove redundant parameters while maximizing the advantages of PEFT. Thus, our LoRAPrune delivers an accurate, compact model for efficient inference in a highly cost-effective manner. Experimental results on various tasks demonstrate that our method achieves state-of-the-art results. For instance, in the VTAB-1k benchmark, LoRAPrune utilizes only 0.76 outperforms magnitude and movement pruning methods by a significant margin, achieving a mean Top-1 accuracy that is 5.7 Moreover, our approach achieves comparable performance to PEFT methods, highlighting its efficacy in delivering high-quality results while benefiting from the advantages of pruning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Pruning Pre-trained Language Models Without Fine-Tuning

To overcome the overparameterized problem in Pre-trained Language Models...
research
08/23/2023

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...
research
10/08/2021

Towards a Unified View of Parameter-Efficient Transfer Learning

Fine-tuning large pre-trained language models on downstream tasks has be...
research
07/08/2021

Weight Reparametrization for Budget-Aware Network Pruning

Pruning seeks to design lightweight architectures by removing redundant ...
research
01/27/2023

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

As a few large-scale pre-trained models become the major choices of vari...
research
06/18/2023

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

Large pre-trained transformers have been receiving explosive attention i...
research
05/25/2022

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Model compression by way of parameter pruning, quantization, or distilla...

Please sign up or login with your details

Forgot password? Click here to reset