Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

02/02/2021
by   Kale-ab Tessera, et al.
31

Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models. We propose a simple experimental framework, Same Capacity Sparse vs Dense Comparison (SC-SDC), that allows for fair comparison of sparse and dense networks. Furthermore, we propose a new measure of gradient flow, Effective Gradient Flow (EGF), that better correlates to performance in sparse networks. Using top-line metrics, SC-SDC and EGF, we show that default choices of optimizers, activation functions and regularizers used for dense networks can disadvantage sparse networks. Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime. Our work suggests that initialization is only one piece of the puzzle and taking a wider view of tailoring optimization to sparse networks yields promising results.

READ FULL TEXT

page 9

page 22

page 26

page 27

page 29

page 30

page 32

page 34

research
10/07/2020

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Sparse Neural Networks (NNs) can match the generalization of dense NNs u...
research
09/24/2018

Dense neural networks as sparse graphs and the lightning initialization

Even though dense networks have lost importance today, they are still us...
research
06/25/2019

The Difficulty of Training Sparse Neural Networks

We investigate the difficulties of training sparse neural networks and m...
research
09/30/2022

Sparse tree-based initialization for neural networks

Dedicated neural network (NN) architectures have been designed to handle...
research
05/18/2023

Learning Activation Functions for Sparse Neural Networks

Sparse Neural Networks (SNNs) can potentially demonstrate similar perfor...
research
07/27/2021

Griffin: Rethinking Sparse Optimization for Deep Learning Architectures

This paper examines the design space trade-offs of DNNs accelerators aim...
research
08/08/2022

A Theoretical View on Sparsely Activated Networks

Deep and wide neural networks successfully fit very complex functions to...

Please sign up or login with your details

Forgot password? Click here to reset