DeepAI AI Chat
Log In Sign Up

Movement Pruning: Adaptive Sparsity by Fine-Tuning

by   Victor Sanh, et al.
Hugging Face, Inc.
cornell university

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3 of the model parameters.


page 1

page 2

page 3

page 4


Pruning Pre-trained Language Models Without Fine-Tuning

To overcome the overparameterized problem in Pre-trained Language Models...

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

The sheer size of modern neural networks makes model serving a serious c...

Block Pruning For Faster Transformers

Pre-training has improved model accuracy for both classification and gen...

Towards Compute-Optimal Transfer Learning

The field of transfer learning is undergoing a significant shift with th...

Structured Pruning of Large Language Models

Large language models have recently achieved state of the art performanc...

A Simple and Effective Pruning Approach for Large Language Models

As their size increases, Large Languages Models (LLMs) are natural candi...

Parameter-Efficient Transfer Learning with Diff Pruning

While task-specific finetuning of pretrained networks has led to signifi...