Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

05/23/2022
by   Yuchao Li, et al.
0

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models. While most research focuses on how to accurately retain appropriate weights while maintaining the performance of the compressed model, there are challenges in the computational overhead and memory footprint of sparse training when compressing large-scale language models. To address this problem, we propose a Parameter-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training in downstream tasks. Specifically, we first combine the data-free and data-driven criteria to efficiently and accurately measure the importance of weights. Then we investigate the intrinsic redundancy of data-driven weight importance and derive two obvious characteristics i.e., low-rankness and structuredness. Based on that, two groups of small matrices are introduced to compute the data-driven importance of weights, instead of using the original large importance score matrix, which therefore makes the sparse training resource-efficient and parameter-efficient. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) on dozens of datasets demonstrate PST performs on par or better than previous sparsity methods, despite only training a small number of parameters. For instance, compared with previous sparsity methods, our PST only requires 1.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Adapting large-scale pretrained language models to downstream tasks via ...
research
10/30/2021

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Gigantic pre-trained models have become central to natural language proc...
research
08/23/2023

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...
research
04/11/2023

Training Large Language Models Efficiently with Sparsity and Dataflow

Large foundation language models have shown their versatility in being a...
research
05/24/2022

AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

Fine-tuning large-scale pre-trained language models to downstream tasks ...
research
10/07/2020

Stochastic parameterization with VARX processes

In this study we investigate a data-driven stochastic methodology to par...
research
10/31/2019

Parameter Sharing Decoder Pair for Auto Composing

Auto Composing is an active and appealing research area in the past few ...

Please sign up or login with your details

Forgot password? Click here to reset