AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

10/08/2022
by   Se Jung Kwon, et al.
8

There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are frozen for all tasks, while the scaling factors are fine-tuned for the downstream task. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.

READ FULL TEXT
research
06/17/2021

LoRA: Low-Rank Adaptation of Large Language Models

The dominant paradigm of natural language processing consists of large-s...
research
11/30/2022

Quadapter: Adapter for GPT-2 Quantization

Transformer language models such as GPT-2 are difficult to quantize beca...
research
06/04/2023

Arbitrary Few Parameters are Good Enough for Adapting Large-scale Pre-trained Language Models

Parameter-efficient tuning (PET) methods can effectively drive extremely...
research
06/13/2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

We introduce a method that dramatically reduces fine-tuning VRAM require...
research
07/31/2023

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

Current state-of-the-art results in computer vision depend in part on fi...
research
11/27/2021

Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Fine-tuning pre-trained language models improves the quality of commerci...
research
05/08/2023

HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation

To fully leverage the advantages of large-scale pre-trained language mod...

Please sign up or login with your details

Forgot password? Click here to reset