Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

06/02/2022
by   Mansheej Paul, et al.
0

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that x2014 after just a few hundred steps of dense training x2014 the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP both through the lens of the data distribution and the loss landscape geometry. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on "easy" training data, we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset. Finally, we identify novel properties of the loss landscape of dense networks that are predictive of IMP performance, showing in particular that more examples being linearly mode connected in the dense network correlates well with good initializations for IMP. Combined, these results provide new insight into the role played by the early phase training in IMP.

READ FULL TEXT

page 5

page 8

page 10

page 12

page 13

page 14

page 15

page 18

research
06/13/2021

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

The lottery ticket hypothesis states that sparse subnetworks exist in ra...
research
07/31/2021

Provably Efficient Lottery Ticket Discovery

The lottery ticket hypothesis (LTH) claims that randomly-initialized, de...
research
10/06/2022

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Modern deep learning involves training costly, highly overparameterized ...
research
02/24/2020

The Early Phase of Neural Network Training

Recent studies have shown that many important aspects of neural network ...
research
05/09/2022

EigenNoise: A Contrastive Prior to Warm-Start Representations

In this work, we present a naive initialization scheme for word vectors ...
research
11/30/2020

Deconstructing the Structure of Sparse Neural Networks

Although sparse neural networks have been studied extensively, the focus...
research
02/24/2022

Rare Gems: Finding Lottery Tickets at Initialization

It has been widely observed that large neural networks can be pruned to ...

Please sign up or login with your details

Forgot password? Click here to reset