What Matters In The Structured Pruning of Generative Language Models?

02/07/2023
by   Michael Santacroce, et al.
0

Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance.

READ FULL TEXT
research
03/30/2022

TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models

Pre-trained language models have been prevailed in natural language proc...
research
05/09/2022

Attribution-based Task-specific Pruning for Multi-task Language Models

Multi-task language models show outstanding performance for various natu...
research
06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...
research
09/18/2021

Structured Pattern Pruning Using Regularization

Iterative Magnitude Pruning (IMP) is a network pruning method that repea...
research
02/14/2021

Error-driven Pruning of Language Models for Virtual Assistants

Language models (LMs) for virtual assistants (VAs) are typically trained...
research
12/15/2022

Gradient-based Intra-attention Pruning on Pre-trained Language Models

Pre-trained language models achieve superior performance, but they are c...
research
10/05/2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Traditional (unstructured) pruning methods for a Transformer model focus...

Please sign up or login with your details

Forgot password? Click here to reset