Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum

07/09/2019
by   Justin Sirignano, et al.
0

We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. We prove the neural network converges in distribution to a random ODE with a Gaussian distribution using mean field analysis. The limit is completely different than in the typical mean-field results for neural networks due to the 1/√(N) normalization factor in the Xavier initialization (versus the 1/N factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2019

Mean Field Analysis of Deep Neural Networks

We analyze multi-layer neural networks in the asymptotic regime of simul...
research
08/28/2018

Mean Field Analysis of Neural Networks: A Central Limit Theorem

Machine learning has revolutionized fields such as image, text, and spee...
research
02/05/2019

Global convergence of neuron birth-death dynamics

Neural networks with a large number of parameters admit a mean-field des...
research
11/20/2020

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...
research
11/18/2019

Implicit Regularization of Normalization Methods

Normalization methods such as batch normalization are commonly used in o...
research
05/16/2019

When random initializations help: a study of variational inference for community detection

Variational approximation has been widely used in large-scale Bayesian i...
research
01/08/2022

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergenc...

Please sign up or login with your details

Forgot password? Click here to reset