PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

06/25/2022
by   Qingru Zhang, et al.
0

Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applications. To reduce the model size, researchers prune these models based on the weights' importance scores. However, such scores are usually estimated on mini-batches during training, which incurs large variability/uncertainty due to mini-batch sampling and complicated training dynamics. As a result, some crucial weights could be pruned by commonly used pruning methods because of such uncertainty, which makes training unstable and hurts generalization. To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation. In particular, for the weights with low importance scores but high uncertainty, PLATON tends to retain them and explores their capacity. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification to validate the effectiveness of PLATON. Results demonstrate that PLATON manifests notable improvement under different sparsity levels. Our code is publicly available at https://github.com/QingruZhang/PLATON.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Pre-trained language models have demonstrated superior performance in va...
research
05/21/2023

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for p...
research
03/18/2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks has be...
research
02/14/2020

Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks

There has been significant progress in recent years in the field of Natu...
research
09/25/2019

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art re...
research
09/09/2021

Bag of Tricks for Optimizing Transformer Efficiency

Improving Transformer efficiency has become increasingly attractive rece...
research
09/25/2020

HetSeq: Distributed GPU Training on Heterogeneous Infrastructure

Modern deep learning systems like PyTorch and Tensorflow are able to tra...

Please sign up or login with your details

Forgot password? Click here to reset