Sparse Networks from Scratch: Faster Training without Losing Performance

07/10/2019
by   Tim Dettmers, et al.
2

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving performance levels competitive with dense networks. We accomplish this by developing sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently. Sparse momentum redistributes pruned weights across layers according to the mean momentum magnitude of each layer. Within a layer, sparse momentum grows weights according to the momentum magnitude of zero-valued weights. We demonstrate state-of-the-art sparse performance on MNIST, CIFAR-10, and ImageNet, decreasing the mean error by a relative 8 show that our algorithm can reliably find the equivalent of winning lottery tickets from random initialization: Our algorithm finds sparse configurations with 20 counterparts. Sparse momentum also decreases the training time: It requires a single training run -- no re-training is required -- and increases training speed up to 11.85x. In our analysis, we show that our sparse networks might be able to reach dense performance levels by learning more general features which are useful to a broader range of classes than dense networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2016

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Modern deep neural networks have a large number of parameters, making th...
research
10/25/2022

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Training a sparse neural network from scratch requires optimizing connec...
research
02/04/2021

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

In this paper, we introduce a new perspective on training deep neural ne...
research
03/27/2019

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

Most artificial networks today rely on dense representations, whereas bi...
research
05/03/2019

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed ...
research
03/17/2019

Evolving and Understanding Sparse Deep Neural Networks using Cosine Similarity

Training sparse neural networks with adaptive connectivity is an active ...

Please sign up or login with your details

Forgot password? Click here to reset