LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

04/08/2020
by   Yihuan Mao, et al.
0

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix factorization and knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various public datasets while the training overheads can be reduced by an order of magnitude.

READ FULL TEXT
research
12/14/2020

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

The pre-training models such as BERT have achieved great results in vari...
research
08/26/2023

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

The use of large transformer-based models such as BERT, GPT, and T5 has ...
research
10/14/2020

Weight Squeezing: Reparameterization for Compression and Fast Inference

In this work, we present a novel approach for simultaneous knowledge tra...
research
02/07/2020

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

In this paper, we propose a novel model compression approach to effectiv...
research
06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...
research
10/06/2020

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

The latency of neural ranking models at query time is largely dependent ...
research
09/13/2021

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Knowledge Distillation (KD) is a model compression algorithm that helps ...

Please sign up or login with your details

Forgot password? Click here to reset