oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

03/30/2023
by   Daniel Campos, et al.
0

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings improves distillation and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Pre-trained Transformer-based language models have become a key building...
research
05/25/2022

Sparse*BERT: Sparse Models are Robust

Large Language Models have become the core architecture upon which most ...
research
01/21/2022

Can Model Compression Improve NLP Fairness

Model compression techniques are receiving increasing attention; however...
research
09/10/2021

Block Pruning For Faster Transformers

Pre-training has improved model accuracy for both classification and gen...
research
10/02/2019

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more pr...
research
03/21/2021

ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques

Pre-trained language models of the BERT family have defined the state-of...
research
09/07/2021

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

Recent studies on compression of pretrained language models (e.g., BERT)...

Please sign up or login with your details

Forgot password? Click here to reset