What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

10/16/2021
by   Mengnan Du, et al.
13

Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks. However, there has been no study in analyzing the impact of compression on the generalizability and robustness of these models. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the easy samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate our mitigation framework to improve both the adversarial generalization as well as in-distribution task performance of the compressed models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

The development of over-parameterized pre-trained language models has ma...
research
01/21/2022

Can Model Compression Improve NLP Fairness

Model compression techniques are receiving increasing attention; however...
research
09/07/2021

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

Recent studies on compression of pretrained language models (e.g., BERT)...
research
03/21/2021

ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques

Pre-trained language models of the BERT family have defined the state-of...
research
11/30/2020

Extreme Model Compression for On-device Natural Language Understanding

In this paper, we propose and experiment with techniques for extreme com...
research
06/12/2023

Deep Model Compression Also Helps Models Capture Ambiguity

Natural language understanding (NLU) tasks face a non-trivial amount of ...
research
11/06/2022

Robust Lottery Tickets for Pre-trained Language Models

Recent works on Lottery Ticket Hypothesis have shown that pre-trained la...

Please sign up or login with your details

Forgot password? Click here to reset