Knowledge-preserving Pruning for Pre-trained Language Models without Retraining

08/07/2023
by   Seungcheol Park, et al.
0

Given a pre-trained language model, how can we efficiently compress it without retraining? Retraining-free structured pruning algorithms are crucial in pre-trained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to preserve the useful knowledge of pre-trained models. In this paper, we propose K-pruning (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pre-trained language models. K-pruning identifies and prunes attention heads and neurons deemed to be superfluous, based on the amount of their inherent knowledge. K-pruning applies an iterative process of pruning followed by knowledge reconstruction for each sub-layer to preserve the knowledge of the pre-trained models. Consequently, K-pruning shows up to 58.02 under a high compression rate of 80

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2022

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

Structured pruning has been extensively studied on monolingual pre-train...
research
09/18/2021

Structured Pattern Pruning Using Regularization

Iterative Magnitude Pruning (IMP) is a network pruning method that repea...
research
10/04/2021

A Novel Metric for Evaluating Semantics Preservation

In this paper, we leverage pre-trained language models (PLMs) to precise...
research
02/14/2021

Error-driven Pruning of Language Models for Virtual Assistants

Language models (LMs) for virtual assistants (VAs) are typically trained...
research
06/02/2023

Task-Agnostic Structured Pruning of Speech Representation Models

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM h...
research
05/21/2023

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for p...
research
12/15/2022

Gradient-based Intra-attention Pruning on Pre-trained Language Models

Pre-trained language models achieve superior performance, but they are c...

Please sign up or login with your details

Forgot password? Click here to reset