Jeffrey Pennington

research

∙ 10/10/2022

Second-order regression models exhibit progressive sharpening to the edge of stability

Recent studies of gradient descent with large step sizes have shown that...

0 Atish Agarwala, et al. ∙

research

∙ 07/11/2022

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Although learning in high dimensions is commonly believed to suffer from...

11 Lechao Xiao, et al. ∙

research

∙ 06/15/2022

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

We introduce repriorisation, a data-dependent reparameterisation which t...

0 Jiri Hron, et al. ∙

research

∙ 06/15/2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Stochastic gradient descent (SGD) is a pillar of modern machine learning...

0 Courtney Paquette, et al. ∙

research

∙ 05/30/2022

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

As modern machine learning models continue to advance the computational ...

0 Lechao Xiao, et al. ∙

research

∙ 05/14/2022

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

We develop a stochastic differential equation, called homogenized SGD, f...

0 Courtney Paquette, et al. ∙

research

∙ 11/16/2021

Covariate Shift in High-Dimensional Random Feature Regression

A significant obstacle in the development of robust machine learning mod...

0 Nilesh Tripuraneni, et al. ∙

research

∙ 11/04/2020

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

Classical learning theory suggests that the optimal generalization perfo...

0 Ben Adlam, et al. ∙

research

∙ 10/14/2020

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

Modern deep learning models have achieved great success in predictive ac...

7 Ben Adlam, et al. ∙

research

∙ 10/14/2020

Temperature check: theory and practice for training models with softmax-cross-entropy losses

The softmax function combined with a cross-entropy loss is a principled ...

10 Atish Agarwala, et al. ∙

research

∙ 08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...

0 Ben Adlam, et al. ∙

research

∙ 07/31/2020

Finite Versus Infinite Neural Networks: an Empirical Study

We perform a careful, thorough, and large scale empirical study of the c...

49 Jaehoon Lee, et al. ∙

research

∙ 06/18/2020

Exact posterior distributions of wide Bayesian neural networks

Recent work has shown that the prior over functions induced by a deep Ba...

0 Jiri Hron, et al. ∙

research

∙ 01/16/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimizatio...

0 Wei Hu, et al. ∙

research

∙ 12/30/2019

Disentangling trainability and generalization in deep learning

A fundamental goal in deep learning is the characterization of trainabil...

54 Lechao Xiao, et al. ∙

research

∙ 12/02/2019

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

One of the distinguishing characteristics of modern deep learning system...

0 Ben Adlam, et al. ∙

research

∙ 02/21/2019

A Mean Field Theory of Batch Normalization

We develop a mean field theory for batch normalization in fully-connecte...

0 Greg Yang, et al. ∙

research

∙ 02/18/2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

A longstanding goal in deep learning research has been to precisely char...

0 Jaehoon Lee, et al. ∙

research

∙ 01/25/2019

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Training recurrent neural networks (RNNs) on long sequence tasks is plag...

12 Dar Gilboa, et al. ∙

research

∙ 10/11/2018

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

There is a previously identified equivalence between wide fully connecte...

18 Roman Novak, et al. ∙

research

∙ 06/14/2018

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks

Recurrent neural networks have gained widespread use in modeling sequenc...

0 Minmin Chen, et al. ∙

research

∙ 06/14/2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

In recent years, state-of-the-art methods in computer vision have utiliz...

0 Lechao Xiao, et al. ∙

research

∙ 02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...

0 Jeffrey Pennington, et al. ∙

research

∙ 02/23/2018

Sensitivity and Generalization in Neural Networks: an Empirical Study

In practice it is often found that large over-parameterized neural netwo...

0 Roman Novak, et al. ∙

research

∙ 02/09/2018

Estimating the Spectral Density of Large Implicit Matrices

Many important problems are characterized by the eigenvalues of a large ...

0 Ryan P. Adams, et al. ∙

research

∙ 11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...

0 Jeffrey Pennington, et al. ∙

research

∙ 11/01/2017

Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its para...

0 Jaehoon Lee, et al. ∙

research

∙ 10/18/2017

A Correspondence Between Random Neural Networks and Statistical Field Theory

A number of recent papers have provided evidence that practical design q...

0 Samuel S. Schoenholz, et al. ∙

Jeffrey Pennington

Featured Co-authors

Sign in with Google

Consider DeepAI Pro