Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

06/21/2023
by   Aleksandra I. Nowak, et al.
0

Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their effect on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper

READ FULL TEXT

page 18

page 22

page 23

page 24

page 25

page 26

page 27

page 28

research
06/19/2021

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Works on lottery ticket hypothesis (LTH) and single-shot network pruning...
research
02/05/2022

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Random pruning is arguably the most naive way to attain sparsity in neur...
research
06/08/2023

Magnitude Attention-based Dynamic Pruning

Existing pruning methods utilize the importance of each weight based on ...
research
05/14/2020

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Train...
research
09/22/2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Neural models are known to be over-parameterized, and recent work has sh...
research
08/13/2023

Influence Function Based Second-Order Channel Pruning-Evaluating True Loss Changes For Pruning Is Possible Without Retraining

A challenge of channel pruning is designing efficient and effective crit...
research
03/26/2023

Does `Deep Learning on a Data Diet' reproduce? Overall yes, but GraNd at Initialization does not

The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduce...

Please sign up or login with your details

Forgot password? Click here to reset