Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

10/07/2020
by   Utku Evci, et al.
0

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exception of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

Training sparse networks to converge to the same performance as dense ne...
research
03/25/2020

Data Parallelism in Training Sparse Neural Networks

Network pruning is an effective methodology to compress large neural net...
research
09/24/2018

Dense neural networks as sparse graphs and the lightning initialization

Even though dense networks have lost importance today, they are still us...
research
01/01/2023

Theoretical Characterization of How Neural Network Pruning Affects its Generalization

It has been observed in practice that applying pruning-at-initialization...
research
06/25/2019

The Difficulty of Training Sparse Neural Networks

We investigate the difficulties of training sparse neural networks and m...
research
11/30/2020

Deconstructing the Structure of Sparse Neural Networks

Although sparse neural networks have been studied extensively, the focus...
research
09/06/2022

What to Prune and What Not to Prune at Initialization

Post-training dropout based approaches achieve high sparsity and are wel...

Please sign up or login with your details

Forgot password? Click here to reset