The Early Phase of Neural Network Training

02/24/2020
by   Jonathan Frankle, et al.
27

Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Despite this behavior, pre-training with blurred inputs or an auxiliary self-supervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently label-dependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.

READ FULL TEXT

page 1

page 4

page 7

page 8

page 9

page 10

page 11

page 12

research
05/25/2021

Graph Self Supervised Learning: the BT, the HSIC, and the VICReg

Self-supervised learning and pre-training strategies have developed over...
research
06/02/2022

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

A striking observation about iterative magnitude pruning (IMP; Frankle e...
research
03/14/2023

Vision-based route following by an embodied insect-inspired sparse neural network

We compared the efficiency of the FlyHash model, an insect-inspired spar...
research
09/03/2019

LCA: Loss Change Allocation for Neural Network Training

Neural networks enjoy widespread use, but many aspects of their training...
research
03/10/2020

Voter Verification of BMD Ballots Is a Two-Part Question: Can They? Mostly, They Can. Do They? Mostly, They Don't

The question of whether or not voters actually verify ballots produced b...
research
10/06/2022

Critical Learning Periods for Multisensory Integration in Deep Networks

We show that the ability of a neural network to integrate information fr...
research
12/27/2020

Understanding Decoupled and Early Weight Decay

Weight decay (WD) is a traditional regularization technique in deep lear...

Please sign up or login with your details

Forgot password? Click here to reset