Log In Sign Up

A Dynamical Central Limit Theorem for Shallow Neural Networks

by   Zhengdao Chen, et al.

Recent theoretical work has characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic regime called the mean-field limit as the number of parameters tends towards infinity. At initialization, the randomly sampled parameters lead to a deviation from the mean-field limit that is dictated by the classical Central Limit Theorem (CLT). However, the dynamics of training introduces correlations among the parameters, raising the question of how the fluctuations evolve during training. Here, we analyze the mean-field dynamics as a Wasserstein gradient flow and prove that the deviations from the mean-field limit scaled by the width, in the width-asymptotic limit, remain bounded throughout training. In particular, they eventually vanish in the CLT scaling if the mean-field dynamics converges to a measure that interpolates the training data. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is given by a Monte-Carlo type resampling error, which does not depend explicitly on the dimension. This bound motivates a regularizaton term on the 2-norm of the underlying measure, which is also connected to generalization via the variation-norm function spaces.


page 1

page 2

page 3

page 4


Global Convergence of Three-layer Neural Networks in the Mean Field Regime

In the mean field regime, neural networks are appropriately scaled so th...

Conservative SPDEs as fluctuating mean field limits of stochastic gradient descent

The convergence of stochastic interacting particle systems in the mean-f...

On Sparsity in Overparametrised Shallow ReLU Networks

The analysis of neural network training beyond their linearization regim...

An analytic theory of shallow networks dynamics for hinge loss classification

Neural networks have been shown to perform incredibly well in classifica...

A duality connecting neural network and cosmological dynamics

We demonstrate that the dynamics of neural networks trained with gradien...

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

The mean field (MF) theory of multilayer neural networks centers around ...

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...