Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

06/15/2020
by   Ruosi Wan, et al.
0

We comprehensively reveal the learning dynamics of deep neural networks (DNN) with batch normalization (BN) and weight decay (WD), named as Spherical Motion Dynamics (SMD). Our theorem on SMD is based on the scale-invariant property of weights caused by BN, and regularization effect of WD. SMD shows the optimization trajectory of weights is like a spherical motion; and a new indicator, angular update is proposed to measure the update efficiency of DNN with BN and WD. We rigorously prove that the angular update is only determined by pre-defined hyper-parameters (i.e. learning rate, WD parameter and momentum coefficient), and provide their quantitative relationship. Most importantly, the quantitative result of SMD can perfectly match the empirical observation in complex and large scale computer vision tasks like ImageNet and COCO with standard training schemes. SMD can also yield reasonable interpretations on some phenomena about BN from an entirely new perspective, including avoidance of vanishing and exploding gradient, no risk of being trapped into sharp minima, and sudden drop of loss when shrinking learning rate. Further, to present the practical significance of SMD, we discuss the connection between SMD and commonly used learning rate tuning scheme: Linear Scaling Principle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2017

L2 Regularization versus Batch and Weight Normalization

Batch Normalization is a commonly used trick to improve the training of ...
research
05/26/2023

Rotational Optimizers: Simple Robust DNN Training

The training dynamics of modern deep neural networks depend on complex i...
research
09/08/2022

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such a...
research
11/13/2020

Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

Modern deep neural network (DNN) trainings utilize various training tech...
research
10/16/2019

An Exponential Learning Rate Schedule for Deep Learning

Intriguing empirical evidence exists that deep learning can work well wi...
research
03/26/2018

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Although deep learning has produced dazzling successes for applications ...
research
03/30/2021

Exploiting Invariance in Training Deep Neural Networks

Inspired by two basic mechanisms in animal visual systems, we introduce ...

Please sign up or login with your details

Forgot password? Click here to reset