Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

10/06/2022
by   Mansheej Paul, et al.
12

Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of training, masking smallest magnitude weights, rewinding back to an early training point, and repeating. Despite its simplicity, the underlying principles for when and how IMP finds winning tickets remain elusive. In particular, what useful information does an IMP mask found at the end of training convey to a rewound network near the beginning of training? How does SGD allow the network to extract this information? And why is iterative pruning needed? We develop answers in terms of the geometry of the error landscape. First, we find thatx2014at higher sparsitiesx2014pairs of pruned networks at successive pruning iterations are connected by a linear path with zero error barrier if and only if they are matching. This indicates that masks found at the end of training convey the identity of an axial subspace that intersects a desired linearly connected mode of a matching sublevel set. Second, we show SGD can exploit this information due to a strong form of robustness: it can return to this mode despite strong perturbations early in training. Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP. Finally, we show that the role of retraining in IMP is to find a network with new small weights to prune. Overall, these results make progress toward demystifying the existence of winning tickets by revealing the fundamental role of error landscape geometry.

READ FULL TEXT

page 2

page 5

page 8

page 9

page 18

page 20

page 21

page 24

research
12/11/2019

Linear Mode Connectivity and the Lottery Ticket Hypothesis

We introduce "instability analysis," a framework for assessing whether t...
research
06/02/2022

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

A striking observation about iterative magnitude pruning (IMP; Frankle e...
research
06/13/2021

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

The lottery ticket hypothesis states that sparse subnetworks exist in ra...
research
05/24/2023

SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning

Given the ever-increasing size of modern neural networks, the significan...
research
04/30/2021

Studying the Consistency and Composability of Lottery Ticket Pruning Masks

Magnitude pruning is a common, effective technique to identify sparse su...
research
05/04/2021

On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

The lottery ticket hypothesis questions the role of overparameterization...
research
05/03/2019

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed ...

Please sign up or login with your details

Forgot password? Click here to reset