DeepAI AI Chat
Log In Sign Up

Structured Pruning of Large Language Models

10/10/2019
by   Ziheng Wang, et al.
MIT
ASAPP INC
0

Large language models have recently achieved state of the art performance across a wide variety of natural language tasks. Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly, and raises an interesting question: do language models need to be large? We study this question through the lens of model compression. We present a novel, structured pruning approach based on low rank factorization and augmented Lagrangian L0 norm regularization. Our structured approach achieves significant inference speedups while matching or outperforming our unstructured pruning baseline at various sparsity levels. We apply our method to state of the art models on the enwiki8 dataset and obtain a 1.19 perplexity score with just 5M parameters, vastly outperforming a model of the same size trained from scratch. We also demonstrate that our method can be applied to language model fine-tuning by pruning the BERT model on several downstream classification benchmarks.

READ FULL TEXT
03/14/2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Pre-trained Transformer-based language models have become a key building...
05/15/2020

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Magnitude pruning is a widely used strategy for reducing model size in p...
09/18/2021

Structured Pattern Pruning Using Regularization

Iterative Magnitude Pruning (IMP) is a network pruning method that repea...
10/12/2022

GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most BERT-Pruning Methods

We revisit the performance of the classic gradual magnitude pruning (GMP...
05/30/2022

Parameter Efficient Diff Pruning for Bias Mitigation

In recent years language models have achieved state of the art performan...
02/14/2021

Error-driven Pruning of Language Models for Virtual Assistants

Language models (LMs) for virtual assistants (VAs) are typically trained...
10/18/2021

BERMo: What can BERT learn from ELMo?

We propose BERMo, an architectural modification to BERT, which makes pre...