
Understanding Double Descent Requires a FineGrained BiasVariance Decomposition
Classical learning theory suggests that the optimal generalization perfo...
read it

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the InfiniteWidth Limit
Modern deep learning models have achieved great success in predictive ac...
read it

Temperature check: theory and practice for training models with softmaxcrossentropy losses
The softmax function combined with a crossentropy loss is a principled ...
read it

The Neural Tangent Kernel in High Dimensions: Triple Descent and a MultiScale Theory of Generalization
Modern deep learning models employ considerably more parameters than req...
read it

Finite Versus Infinite Neural Networks: an Empirical Study
We perform a careful, thorough, and large scale empirical study of the c...
read it

Exact posterior distributions of wide Bayesian neural networks
Recent work has shown that the prior over functions induced by a deep Ba...
read it

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
The selection of initial parameter values for gradientbased optimizatio...
read it

Disentangling trainability and generalization in deep learning
A fundamental goal in deep learning is the characterization of trainabil...
read it

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning
One of the distinguishing characteristics of modern deep learning system...
read it

A Mean Field Theory of Batch Normalization
We develop a mean field theory for batch normalization in fullyconnecte...
read it

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
A longstanding goal in deep learning research has been to precisely char...
read it

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Training recurrent neural networks (RNNs) on long sequence tasks is plag...
read it

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connecte...
read it

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks
Recurrent neural networks have gained widespread use in modeling sequenc...
read it

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000Layer Vanilla Convolutional Neural Networks
In recent years, stateoftheart methods in computer vision have utiliz...
read it

The Emergence of Spectral Universality in Deep Networks
Recent work has shown that tight concentration of the entire spectrum of...
read it

Sensitivity and Generalization in Neural Networks: an Empirical Study
In practice it is often found that large overparameterized neural netwo...
read it

Estimating the Spectral Density of Large Implicit Matrices
Many important problems are characterized by the eigenvalues of a large ...
read it

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
It is well known that the initialization of weights in deep neural netwo...
read it

Deep Neural Networks as Gaussian Processes
A deep fullyconnected neural network with an i.i.d. prior over its para...
read it

A Correspondence Between Random Neural Networks and Statistical Field Theory
A number of recent papers have provided evidence that practical design q...
read it
Jeffrey Pennington
is this you? claim profile