Exploring Learning Dynamics of DNNs via Layerwise Conditioning Analysis

02/25/2020
by   Lei Huang, et al.
4

Conditioning analysis uncovers the landscape of optimization objective by exploring the spectrum of its curvature matrix. It is well explored theoretically for linear models. We extend this analysis to deep neural networks (DNNs). To this end, we propose a layer-wise conditioning analysis that explores the optimization landscape with respect to each layer independently. Such an analysis is theoretically supported under mild assumptions that approximately hold in practice. Based on our analysis, we show that batch normalization (BN) can adjust the magnitude of the layer activations/gradients, and thus stabilizes the training. However, such a stabilization can result in a false impression of a local minimum, which sometimes has detrimental effects on the learning. Besides, we experimentally observe that BN can improve the layer-wise conditioning of the optimization problem. Finally, we observe that the last linear layer of very deep residual network has ill-conditioned behavior during training. We solve this problem by only adding one BN layer before the last linear layer, which achieves improved performance over the original residual networks, especially when the networks are deep.

READ FULL TEXT
research
02/04/2020

A Deep Conditioning Treatment of Neural Networks

We study the role of depth in training randomly initialized overparamete...
research
10/06/2017

Projection Based Weight Normalization for Deep Neural Networks

Optimizing deep neural networks (DNNs) often suffers from the ill-condit...
research
04/18/2018

Are ResNets Provably Better than Linear Predictors?

A residual network (or ResNet) is a standard deep neural net architectur...
research
06/01/2018

Understanding Batch Normalization

Batch normalization is a ubiquitous deep learning technique that normali...
research
07/05/2022

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

Inserting an SVD meta-layer into neural networks is prone to make the co...
research
09/18/2022

Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions

Pruning is one of the predominant approaches for compressing deep neural...
research
10/03/2022

Module-wise Training of Residual Networks via the Minimizing Movement Scheme

Greedy layer-wise or module-wise training of neural networks is compelli...

Please sign up or login with your details

Forgot password? Click here to reset