Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

11/13/2020
by   Cheng Chen, et al.
3

Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. Despite their effectiveness, it is still mysterious how they help accelerate DNN trainings in practice. In this paper, we provide an empirical study of the regularization effect of these training techniques on DNN optimization. Specifically, we find that the optimization trajectories of successful DNN trainings consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Theoretically, we show that such a regularity principle leads to a convergence guarantee in nonconvex optimization and the convergence rate depends on a regularization parameter. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory. On the other hand, DNN trainings without the training techniques have slow convergence and obey the regularity principle with a small regularization parameter, implying that the model updates are not well aligned with the trajectory. Therefore, different training techniques regularize the model update direction via the regularity principle to facilitate the convergence.

READ FULL TEXT
research
02/10/2022

Feasible Low-thrust Trajectory Identification via a Deep Neural Network Classifier

In recent years, deep learning techniques have been introduced into the ...
research
07/03/2018

Training behavior of deep neural network in frequency domain

Why deep neural networks (DNNs) capable of overfitting often generalize ...
research
06/15/2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

We comprehensively reveal the learning dynamics of deep neural networks ...
research
09/11/2019

Regularized deep learning with non-convex penalties

Regularization methods are often employed in deep learning neural networ...
research
09/11/2019

Regularized deep learning with a non-convex penalty

Regularization methods are often employed in deep learning neural networ...
research
05/29/2017

The Principle of Logit Separation

We consider neural network training, in applications in which there are ...
research
06/23/2023

Minibatch training of neural network ensembles via trajectory sampling

Most iterative neural network training methods use estimates of the loss...

Please sign up or login with your details

Forgot password? Click here to reset