Directional convergence and alignment in deep learning

06/11/2020
by   Ziwei Ji, et al.
0

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks – a broad class of networks allowing for ReLU, max pooling, linear, and convolutional layers – and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the ResNet). If the network further has locally Lipschitz gradients, we show that these gradients converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Lojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

READ FULL TEXT

page 2

page 3

research
06/13/2019

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Recent works on implicit regularization have shown that gradient descent...
research
10/26/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...
research
11/03/2021

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

Understanding the properties of neural networks trained via stochastic g...
research
05/17/2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

With an eye toward understanding complexity control in deep learning, we...
research
01/28/2022

Training invariances and the low-rank phenomenon: beyond linear networks

The implicit bias induced by the training of neural networks has become ...
research
10/06/2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex...
research
08/02/2021

Convergence rates of deep ReLU networks for multiclass classification

For classification problems, trained deep neural networks return probabi...

Please sign up or login with your details

Forgot password? Click here to reset