Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

05/03/2019
by   Hattie Zhou, et al.
12

The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keep the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the re-initialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, or masks that can be applied to an untrained, randomly initialized network to produce a model with performance far better than chance (86

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2018

The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks

Neural network compression techniques are able to reduce the parameter c...
research
03/08/2022

Dual Lottery Ticket Hypothesis

Fully exploiting the learning capacity of neural networks requires overp...
research
03/28/2023

Randomly Initialized Subnetworks with Iterative Weight Recycling

The Multi-Prize Lottery Ticket Hypothesis posits that randomly initializ...
research
10/26/2021

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial at...
research
07/10/2019

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelera...
research
10/06/2022

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Modern deep learning involves training costly, highly overparameterized ...
research
01/31/2022

Signing the Supermask: Keep, Hide, Invert

The exponential growth in numbers of parameters of neural networks over ...

Please sign up or login with your details

Forgot password? Click here to reset