When Does Re-initialization Work?

06/20/2022
by   Sheheryar Zaidi, et al.
0

Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.

READ FULL TEXT

page 2

page 8

page 9

research
03/23/2021

How to decay your learning rate

Complex learning rate schedules have become an integral part of deep lea...
research
03/09/2020

Correlated Initialization for Correlated Data

Spatial data exhibits the property that nearby points are correlated. Th...
research
05/03/2021

Initialization and Regularization of Factorized Neural Layers

Factorized layers–operations parameterized by products of two or more ma...
research
10/31/2020

Meta-Learning with Adaptive Hyperparameters

Despite its popularity, several recent works question the effectiveness ...
research
12/04/2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...
research
01/05/2020

Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters

In this paper, we investigate the empirical impact of orthogonality regu...
research
06/12/2021

Go Small and Similar: A Simple Output Decay Brings Better Performance

Regularization and data augmentation methods have been widely used and b...

Please sign up or login with your details

Forgot password? Click here to reset