Finite Versus Infinite Neural Networks: an Empirical Study

by   Jaehoon Lee, et al.

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.


On the infinite width limit of neural networks with a standard parameterization

There are currently two parameterizations used to derive fixed kernels c...

On neural network kernels and the storage capacity problem

In this short note, we reify the connection between work on the storage ...

Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

Feature learning, or the ability of deep neural networks to automaticall...

Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks

Understanding the fundamental principles behind the massive success of n...

Asymptotics of representation learning in finite Bayesian neural networks

Recent works have suggested that finite Bayesian neural networks may out...

Kernel Regression with Infinite-Width Neural Networks on Millions of Examples

Neural kernels have drastically increased performance on diverse and non...

Nonperturbative renormalization for the neural network-QFT correspondence

In a recent work arXiv:2008.08601, Halverson, Maiti and Stoner proposed ...

Please sign up or login with your details

Forgot password? Click here to reset