Extreme Compression for Pre-trained Transformers Made Simple and Efficient

06/04/2022
by   Xiaoxia Wu, et al.
0

Extreme compression, particularly ultra-low bit precision (binary/ternary) quantization, has been proposed to fit large NLP models on resource-constraint devices. However, to preserve the accuracy for such aggressive compression schemes, cutting-edge methods usually introduce complicated compression pipelines, e.g., multi-stage expensive knowledge distillation with extensive hyperparameter tuning. Also, they oftentimes focus less on smaller transformer models that have already been heavily compressed via knowledge distillation and lack a systematic study to show the effectiveness of their methods. In this paper, we perform a very comprehensive systematic study to measure the impact of many key hyperparameters and training strategies from previous works. As a result, we find out that previous baselines for ultra-low bit precision quantization are significantly under-trained. Based on our study, we propose a simple yet effective compression pipeline for extreme compression, named XTC. XTC demonstrates that (1) we can skip the pre-training knowledge distillation to obtain a 5-layer BERT while achieving better performance than previous state-of-the-art methods, e.g., the 6-layer TinyBERT; (2) extreme quantization plus layer reduction is able to reduce the model size by 50x, resulting in new state-of-the-art results on GLUE tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

The development of over-parameterized pre-trained language models has ma...
research
01/15/2021

KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization

Recently, transformer-based language models such as BERT have shown trem...
research
05/13/2023

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Affected by the massive amount of parameters, ViT usually suffers from s...
research
02/23/2023

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Pre-trained Transformer models such as BERT have shown great success in ...
research
07/01/2023

Variation-aware Vision Transformer Quantization

Despite the remarkable performance of Vision Transformers (ViTs) in vari...
research
04/08/2020

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

BERT is a cutting-edge language representation model pre-trained by a la...
research
07/17/2023

Extreme Image Compression using Fine-tuned VQGAN Models

Recent advances in generative compression methods have demonstrated rema...

Please sign up or login with your details

Forgot password? Click here to reset