Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

09/08/2022
by   Maxim Kodryan, et al.
0

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed t...
research
06/15/2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

We comprehensively reveal the learning dynamics of deep neural networks ...
research
10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...
research
02/23/2023

Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

We systematically analyze optimization dynamics in deep neural networks ...
research
06/14/2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) we...
research
01/08/2021

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. Howev...
research
02/24/2020

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep lea...

Please sign up or login with your details

Forgot password? Click here to reset