Quadapter: Adapter for GPT-2 Quantization

11/30/2022
by   Minseop Park, et al.
0

Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2022

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

There are growing interests in adapting large-scale language models usin...
research
12/19/2018

Fast Adjustable Threshold For Uniform Neural Network Quantization

Neural network quantization procedure is the necessary step for porting ...
research
06/22/2023

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Transformer models have been widely adopted in various domains over the ...
research
10/13/2019

Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators

Outliers in weights and activations pose a key challenge for fixed-point...
research
04/04/2019

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

Several datasets have recently been constructed to expose brittleness in...
research
11/16/2022

Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations

Due to the huge amount of parameters, fine-tuning of pretrained language...
research
06/13/2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

We introduce a method that dramatically reduces fine-tuning VRAM require...

Please sign up or login with your details

Forgot password? Click here to reset