Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

07/12/2023
by   James O'Neill, et al.
0

We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R-Base and InfoXLM-Base and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2020

Towards Fully 8-bit Integer Inference for the Transformer Model

8-bit integer inference, as a promising direction in reducing both the l...
research
05/21/2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Efficient deployment of large language models (LLMs) necessitates low-bi...
research
04/18/2023

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Quantization of transformer language models faces significant challenges...
research
10/29/2022

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Transformer-based architectures like BERT have achieved great success in...
research
06/24/2021

Quantization Aware Training, ERNIE and Kurtosis Regularizer: a short empirical study

Pre-trained language models like Ernie or Bert are currently used in man...
research
06/01/2023

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Large language models (LLMs) have shown excellent performance on various...
research
01/02/2023

Massive Language Models Can Be Accurately Pruned in One-Shot

We show for the first time that large-scale generative pretrained transf...

Please sign up or login with your details

Forgot password? Click here to reset