DeepAI AI Chat
Log In Sign Up

Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum

by   Justin Sirignano, et al.
Boston University
University of Illinois at Urbana-Champaign

We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. We prove the neural network converges in distribution to a random ODE with a Gaussian distribution using mean field analysis. The limit is completely different than in the typical mean-field results for neural networks due to the 1/√(N) normalization factor in the Xavier initialization (versus the 1/N factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set.


page 1

page 2

page 3

page 4


Mean Field Analysis of Deep Neural Networks

We analyze multi-layer neural networks in the asymptotic regime of simul...

Mean Field Analysis of Neural Networks: A Central Limit Theorem

Machine learning has revolutionized fields such as image, text, and spee...

Global convergence of neuron birth-death dynamics

Neural networks with a large number of parameters admit a mean-field des...

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...

Implicit Regularization of Normalization Methods

Normalization methods such as batch normalization are commonly used in o...

When random initializations help: a study of variational inference for community detection

Variational approximation has been widely used in large-scale Bayesian i...

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergenc...