Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

09/15/2023
by   Miles Williams, et al.
0

The extensive memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings, such as cloud environments or on-device. PLMs use embedding matrices to represent extensive vocabularies, forming a large proportion of the model parameters. While previous work towards parameter-efficient PLM development has considered pruning parameters within the transformer layers, pruning the embedding matrix as part of fine-tuning or inference has yet to be explored. We first demonstrate that a significant proportion of the vocabulary remains unused in these scenarios. We then propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix. We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks. Notably, our approach maintains equivalent downstream task performance while allowing a more efficient use of compute resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Pruning Pre-trained Language Models Without Fine-Tuning

To overcome the overparameterized problem in Pre-trained Language Models...
research
10/14/2022

HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Transformer-based pre-trained language models are vocabulary-dependent, ...
research
10/24/2020

Rethinking embedding coupling in pre-trained language models

We re-evaluate the standard practice of sharing weights between input an...
research
03/27/2023

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

In this paper, we propose a highly parameter-efficient approach to scali...
research
04/06/2022

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

Structured pruning has been extensively studied on monolingual pre-train...
research
05/17/2023

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

It has become a popular paradigm to transfer the knowledge of large-scal...
research
05/21/2023

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for p...

Please sign up or login with your details

Forgot password? Click here to reset