A Dynamical Central Limit Theorem for Shallow Neural Networks

by   Zhengdao Chen, et al.

Recent theoretical work has characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic regime called the mean-field limit as the number of parameters tends towards infinity. At initialization, the randomly sampled parameters lead to a deviation from the mean-field limit that is dictated by the classical Central Limit Theorem (CLT). However, the dynamics of training introduces correlations among the parameters, raising the question of how the fluctuations evolve during training. Here, we analyze the mean-field dynamics as a Wasserstein gradient flow and prove that the deviations from the mean-field limit scaled by the width, in the width-asymptotic limit, remain bounded throughout training. In particular, they eventually vanish in the CLT scaling if the mean-field dynamics converges to a measure that interpolates the training data. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is given by a Monte-Carlo type resampling error, which does not depend explicitly on the dimension. This bound motivates a regularizaton term on the 2-norm of the underlying measure, which is also connected to generalization via the variation-norm function spaces.


page 1

page 2

page 3

page 4


Global Optimality of Elman-type RNN in the Mean-Field Regime

We analyze Elman-type Recurrent Reural Networks (RNNs) and their trainin...

Global Convergence of Three-layer Neural Networks in the Mean Field Regime

In the mean field regime, neural networks are appropriately scaled so th...

On Sparsity in Overparametrised Shallow ReLU Networks

The analysis of neural network training beyond their linearization regim...

An analytic theory of shallow networks dynamics for hinge loss classification

Neural networks have been shown to perform incredibly well in classifica...

A duality connecting neural network and cosmological dynamics

We demonstrate that the dynamics of neural networks trained with gradien...

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...

Introduction to dynamical mean-field theory of generic random neural networks

Dynamical mean-field theory is a powerful physics tool used to analyze t...

Please sign up or login with your details

Forgot password? Click here to reset