Block Pruning For Faster Transformers

09/10/2021
by   François Lagunas, et al.
1

Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured methods by considering blocks of any size and integrates this structure into the movement pruning paradigm for fine-tuning. We find that this approach learns to prune out full components of the underlying model, such as attention heads. Experiments consider classification and generation tasks, yielding among other results a pruned model that is a 2.4x faster, 74 BERT on SQuAD v1, with a 1 in speed and pruned models in size.

READ FULL TEXT
research
05/15/2020

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Magnitude pruning is a widely used strategy for reducing model size in p...
research
04/27/2020

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

While pre-training and fine-tuning, e.g., BERT <cit.>, GPT-2 <cit.>, hav...
research
03/30/2023

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

In this paper, we introduce the range of oBERTa language models, an easy...
research
04/01/2022

Structured Pruning Learns Compact and Accurate Models

The growing size of neural language models has led to increased attentio...
research
05/25/2022

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Model compression by way of parameter pruning, quantization, or distilla...
research
09/26/2021

On the Prunability of Attention Heads in Multilingual BERT

Large multilingual models, such as mBERT, have shown promise in crosslin...
research
06/02/2020

Shapley Value as Principled Metric for Structured Network Pruning

Structured pruning is a well-known technique to reduce the storage size ...

Please sign up or login with your details

Forgot password? Click here to reset