Lottery Jackpots Exist in Pre-trained Models

04/18/2021
by   Yuxin Zhang, et al.
0

Network pruning is an effective approach to reduce network complexity without performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight tuning or complex search on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight tuning, termed "lottery jackpots", exist in pre-trained models with unexpanded width. For example, we obtain a lottery jackpot that has only 10 VGGNet-19 without any modifications on the pre-trained weights. Furthermore, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. Based on this insight, we initialize our sparse mask using the magnitude pruning, resulting in at least 3x cost reduction on the lottery jackpot search while achieves comparable or even better performance. Specifically, our magnitude-based lottery jackpot removes 90 easily obtains more than 70 ImageNet.

READ FULL TEXT
research
04/30/2021

Studying the Consistency and Composability of Lottery Ticket Pruning Masks

Magnitude pruning is a common, effective technique to identify sparse su...
research
01/13/2022

Automatic Sparse Connectivity Learning for Neural Networks

Since sparse neural networks usually contain many zero weights, these un...
research
06/22/2020

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

As deep neural networks are growing in size and being increasingly deplo...
research
06/08/2023

Magnitude Attention-based Dynamic Pruning

Existing pruning methods utilize the importance of each weight based on ...
research
05/24/2023

SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning

Given the ever-increasing size of modern neural networks, the significan...
research
01/27/2023

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

As a few large-scale pre-trained models become the major choices of vari...
research
05/04/2020

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

Sparse models require less memory for storage and enable a faster infere...

Please sign up or login with your details

Forgot password? Click here to reset