Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

by   Aleksandra I. Nowak, et al.

Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their effect on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at


page 18

page 22

page 23

page 24

page 25

page 26

page 27

page 28


Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Works on lottery ticket hypothesis (LTH) and single-shot network pruning...

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Random pruning is arguably the most naive way to attain sparsity in neur...

Magnitude Attention-based Dynamic Pruning

Existing pruning methods utilize the importance of each weight based on ...

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Train...

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Neural models are known to be over-parameterized, and recent work has sh...

Influence Function Based Second-Order Channel Pruning-Evaluating True Loss Changes For Pruning Is Possible Without Retraining

A challenge of channel pruning is designing efficient and effective crit...

Does `Deep Learning on a Data Diet' reproduce? Overall yes, but GraNd at Initialization does not

The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduce...

Please sign up or login with your details

Forgot password? Click here to reset