Log In Sign Up

The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks

by   Jonathan Frankle, et al.

Neural network compression techniques are able to reduce the parameter counts of trained networks by over 90 inference performance--without compromising accuracy. However, contemporary experience is that it is difficult to train small architectures from scratch, which would similarly improve training performance. We articulate a new conjecture to explain why it is easier to train large networks: the "lottery ticket hypothesis." It states that large networks that train successfully contain subnetworks that--when trained in isolation--converge in a comparable number of iterations to comparable accuracy. These subnetworks, which we term "winning tickets," have won the initialization lottery: their connections have initial weights that make training particularly effective. We find that a standard technique for pruning unnecessary network weights naturally uncovers a subnetwork which, at the start of training, comprised a winning ticket. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis. We consistently find winning tickets that are less than 20 fully-connected, convolutional, and residual architectures for MNIST and CIFAR10. Furthermore, winning tickets at moderate levels of pruning (20-50 the original network size) converge up to 6.7x faster than the original network and exhibit higher test accuracy.


page 1

page 2

page 3

page 4


The Lottery Ticket Hypothesis: Training Pruned Neural Networks

Recent work on neural network pruning indicates that, at training time, ...

Robust Learning of Parsimonious Deep Neural Networks

We propose a simultaneous learning and pruning algorithm capable of iden...

Principal Component Networks: Parameter Reduction Early in Training

Recent works show that overparameterized networks contain small subnetwo...

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed ...

Signing the Supermask: Keep, Hide, Invert

The exponential growth in numbers of parameters of neural networks over ...

On the Compression of Natural Language Models

Deep neural networks are effective feature extractors but they are prohi...

Rare Gems: Finding Lottery Tickets at Initialization

It has been widely observed that large neural networks can be pruned to ...

Code Repositories


Lottery Ticker Hypothesis in Chainer

view repo