KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization

01/15/2021
by   Jing Jin, et al.
0

Recently, transformer-based language models such as BERT have shown tremendous performance improvement for a range of natural language processing tasks. However, these language models usually are computation expensive and memory intensive during inference. As a result, it is difficult to deploy them on resource-restricted devices. To improve the inference performance, as well as reduce the model size while maintaining the model accuracy, we propose a novel quantization method named KDLSQ-BERT that combines knowledge distillation (KD) with learned step size quantization (LSQ) for language model quantization. The main idea of our method is that the KD technique is leveraged to transfer the knowledge from a "teacher" model to a "student" model when exploiting LSQ to quantize that "student" model during the quantization training process. Extensive experiment results on GLUE benchmark and SQuAD demonstrate that our proposed KDLSQ-BERT not only performs effectively when doing different bit (e.g. 2-bit ∼ 8-bit) quantization, but also outperforms the existing BERT quantization methods, and even achieves comparable performance as the full-precision base-line model while obtaining 14.9x compression ratio. Our code will be public available.

READ FULL TEXT
research
12/30/2021

Automatic Mixed-Precision Quantization Search of BERT

Pre-trained language models such as BERT have shown remarkable effective...
research
05/13/2023

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Affected by the massive amount of parameters, ViT usually suffers from s...
research
03/25/2022

MKQ-BERT: Quantized BERT with 4-bits Weights and Activations

Recently, pre-trained Transformer based language models, such as BERT, h...
research
06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...
research
11/20/2022

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Knowledge distillation (KD) has been a ubiquitous method for model compr...
research
10/29/2022

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Transformer-based architectures like BERT have achieved great success in...
research
10/14/2020

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

Recently, pre-trained language models like BERT have shown promising per...

Please sign up or login with your details

Forgot password? Click here to reset