Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

06/18/2023
by   Ajay Jaiswal, et al.
0

Large pre-trained transformers have been receiving explosive attention in the past few years, due to their wide adaptability for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks that can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine. More specifically, during the mask generation stage, ISP takes a small handful of iterations using varying training protocols and data subsets to generate many weak and noisy subnetworks, and superpose them to average out the noise creating a high-quality denoised subnetwork. Our extensive experiments and ablation on two popular large-scale pre-trained models: CLIP (unexplored in pruning till date) and BERT across multiple benchmark vision and language datasets validate the effectiveness of ISP compared to several state-of-the-art pruning methods. Codes are available at: <https://github.com/VITA-Group/instant_soup>

READ FULL TEXT
research
10/30/2021

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Gigantic pre-trained models have become central to natural language proc...
research
04/24/2022

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trai...
research
06/06/2023

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Large pre-trained transformers are show-stealer in modern-day deep learn...
research
01/27/2023

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

As a few large-scale pre-trained models become the major choices of vari...
research
05/28/2023

Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exc...
research
05/18/2023

Structural Pruning for Diffusion Models

Generative modeling has recently undergone remarkable advancements, prim...
research
01/23/2020

Filter Sketch for Network Pruning

In this paper, we propose a novel network pruning approach by informatio...

Please sign up or login with your details

Forgot password? Click here to reset