Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

04/06/2023
by   Luis Carvalho, et al.
0

Recent developments in applications of artificial neural networks with over n=10^14 parameters make it extremely important to study the large n behaviour of such networks. Most works studying wide neural networks have focused on the infinite width n → +∞ limit of such networks and have shown that, at initialization, they correspond to Gaussian processes. In this work we will study their behavior for large, but finite n. Our main contributions are the following: (1) The computation of the corrections to Gaussianity in terms of an asymptotic series in n^-1/2. The coefficients in this expansion are determined by the statistics of parameter initialization and by the activation function. (2) Controlling the evolution of the outputs of finite width n networks, during training, by computing deviations from the limiting infinite width case (in which the network evolves through a linear flow). This improves previous estimates and yields sharper decay rates for the (finite width) NTK in terms of n, valid during the entire training procedure. As a corollary, we also prove that, with arbitrarily high probability, the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function. (3) Estimating how the deviations from Gaussianity evolve with training in terms of n. In particular, using a certain metric in the space of measures we find that, along training, the resulting measure is within n^-1/2(log n)^1+ of the time dependent Gaussian process corresponding to the infinite width network (which is explicitly given by precomposing the initial Gaussian process with the linear flow corresponding to training in the infinite width limit).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2023

Les Houches Lectures on Deep Learning at Large Infinite Width

These lectures, presented at the 2022 Les Houches Summer School on Stati...
research
06/17/2021

Wide stochastic networks: Gaussian limit and PAC-Bayesian training

The limit of infinite width allows for substantial simplifications in th...
research
08/27/2019

Finite size corrections for neural network Gaussian processes

There has been a recent surge of interest in modeling neural networks (N...
research
08/19/2020

Neural Networks and Quantum Field Theory

We propose a theoretical understanding of neural networks in terms of Wi...
research
11/15/2022

Characterizing the Spectrum of the NTK via a Power Series Expansion

Under mild conditions on the network initialization we derive a power se...
research
04/02/2020

Predicting the outputs of finite networks trained with noisy gradients

A recent line of studies has focused on the infinite width limit of deep...
research
06/06/2022

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

The logit outputs of a feedforward neural network at initialization are ...

Please sign up or login with your details

Forgot password? Click here to reset