Jascha Sohl-Dickstein

research

∙ 04/21/2023

Noise-Reuse in Online Evolution Strategies

Online evolution strategies have become an attractive alternative to aut...

0 Oscar Li, et al. ∙

research

∙ 02/22/2023

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Since their introduction, diffusion models have quickly become the preva...

0 Yilun Du, et al. ∙

research

∙ 12/08/2022

General-Purpose In-Context Learning by Meta-Learning Transformers

Modern machine learning requires system designers to specify aspects of ...

0 Louis Kirsch, et al. ∙

research

∙ 11/17/2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across m...

0 Luke Metz, et al. ∙

research

∙ 09/22/2022

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Learned optimizers – neural networks that are trained to act as optimize...

21 James Harrison, et al. ∙

research

∙ 07/21/2022

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities...

5 David Dohan, et al. ∙

research

∙ 06/17/2022

Fast Finite Width Neural Tangent Kernel

The Neural Tangent Kernel (NTK), defined as Θ_θ^f(x_1, x_2) = [∂ f(θ, x_...

80 Roman Novak, et al. ∙

research

∙ 06/15/2022

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

We introduce repriorisation, a data-dependent reparameterisation which t...

0 Jiri Hron, et al. ∙

research

∙ 03/22/2022

Practical tradeoffs between memory, compute, and performance in learned optimizers

Optimization plays a costly and crucial role in developing machine learn...

10 Luke Metz, et al. ∙

research

∙ 12/27/2021

Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

Unrolled computation graphs arise in many scenarios, including training ...

9 Paul Vicol, et al. ∙

research

∙ 01/14/2021

Training Learned Optimizers with Randomly Initialized Learned Optimizers

Learned optimizers are increasingly effective, with performance exceedin...

8 Luke Metz, et al. ∙

research

∙ 12/07/2020

Parallel Training of Deep Networks with Local Updates

Deep learning models trained on large data sets have been widely success...

8 Michael (Misha) Laskin, et al. ∙

research

∙ 11/11/2020

Towards NNGP-guided Neural Architecture Search

The predictions of wide Bayesian neural networks are described by a Gaus...

6 Daniel S. Park, et al. ∙

research

∙ 11/04/2020

Reverse engineering learned optimizers reveals known and novel mechanisms

Learned optimizers are algorithms that can themselves be trained to solv...

22 Niru Maheswaranathan, et al. ∙

research

∙ 10/21/2020

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

We perform an extensive empirical study of the statistical properties of...

0 Vinay Rao, et al. ∙

research

∙ 09/23/2020

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Much as replacing hand-designed features with learned functions has revo...

12 Luke Metz, et al. ∙

research

∙ 08/17/2020

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Machine learning is predicated on the concept of generalization: a model...

0 Neha S. Wadia, et al. ∙

research

∙ 07/31/2020

Finite Versus Infinite Neural Networks: an Empirical Study

We perform a careful, thorough, and large scale empirical study of the c...

49 Jaehoon Lee, et al. ∙

research

∙ 07/17/2020

A new method for parameter estimation in probabilistic models: Minimum probability flow

Fitting probabilistic models to data is often difficult, due to the gene...

0 Jascha Sohl-Dickstein, et al. ∙

research

∙ 06/18/2020

Exact posterior distributions of wide Bayesian neural networks

Recent work has shown that the prior over functions induced by a deep Ba...

0 Jiri Hron, et al. ∙

research

∙ 06/18/2020

Infinite attention: NNGP and NTK for deep attention networks

There is a growing amount of literature on the relationship between wide...

0 Jiri Hron, et al. ∙

research

∙ 05/13/2020

Two equalities expressing the determinant of a matrix in terms of expectations over matrix-vector products

We introduce two equations expressing the inverse determinant of a full ...

0 Jascha Sohl-Dickstein, et al. ∙

research

∙ 03/12/2020

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

We show that the sum of the implicit generator log-density log p_g of a ...

24 Tong Che, et al. ∙

research

∙ 03/04/2020

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the pe...

0 Aitor Lewkowycz, et al. ∙

research

∙ 02/27/2020

Using a thousand optimization tasks to learn hyperparameter search strategies

We present TaskSet, a dataset of tasks for use in training and evaluatin...

71 Luke Metz, et al. ∙

research

∙ 01/21/2020

On the infinite width limit of neural networks with a standard parameterization

There are currently two parameterizations used to derive fixed kernels c...

13 Jascha Sohl-Dickstein, et al. ∙

research

∙ 12/05/2019

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Neural Tangents is a library designed to enable research into infinite-w...

14 Roman Novak, et al. ∙

research

∙ 09/10/2019

Neural reparameterization improves structural optimization

Structural optimization is a popular method for designing objects such a...

0 Stephan Hoyer, et al. ∙

research

∙ 06/08/2019

Using learned optimizers to make models robust to input noise

State-of-the art vision models can achieve superhuman performance on ima...

0 Luke Metz, et al. ∙

research

∙ 05/09/2019

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

We investigate how the final parameters found by stochastic gradient des...

0 Daniel S. Park, et al. ∙

research

∙ 03/18/2019

A RAD approach to deep mixture models

Flow based models such as Real NVP are an extremely powerful approach to...

4 Laurent Dinh, et al. ∙

research

∙ 02/21/2019

A Mean Field Theory of Batch Normalization

We develop a mean field theory for batch normalization in fully-connecte...

0 Greg Yang, et al. ∙

research

∙ 02/18/2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

A longstanding goal in deep learning research has been to precisely char...

0 Jaehoon Lee, et al. ∙

research

∙ 01/12/2019

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

Recent work has noted that all bad local minima can be removed from neur...

0 Jascha Sohl-Dickstein, et al. ∙

research

∙ 11/08/2018

Measuring the Effects of Data Parallelism on Neural Network Training

Recent hardware developments have made unprecedented amounts of data par...

0 Christopher J. Shallue, et al. ∙

research

∙ 10/24/2018

Learned optimizers that outperform SGD on wall-clock and test loss

Deep learning has shown that learned functions can dramatically outperfo...

8 Luke Metz, et al. ∙

research

∙ 10/24/2018

Learned optimizers that outperform SGD on wall-clock and validation loss

Deep learning has shown that learned functions can dramatically outperfo...

2 Luke Metz, et al. ∙

research

∙ 10/11/2018

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

There is a previously identified equivalence between wide fully connecte...

18 Roman Novak, et al. ∙

research

∙ 06/28/2018

Adversarial Reprogramming of Neural Networks

Deep neural networks are susceptible to adversarial attacks. In computer...

10 Gamaleldin F. Elsayed, et al. ∙

research

∙ 06/26/2018

Guided evolutionary strategies: escaping the curse of dimensionality in random search

Many applications in machine learning require optimizing a function whos...

2 Niru Maheswaranathan, et al. ∙

research

∙ 06/25/2018

Stochastic natural gradient descent draws posterior samples in function space

Natural gradient descent (NGD) minimises the cost function on a Riemanni...

0 Samuel L. Smith, et al. ∙

research

∙ 06/22/2018

PCA of high dimensional random walks with comparison to neural network training

One technique to visualize the training of neural networks is to perform...

0 Joseph M. Antognini, et al. ∙

research

∙ 06/14/2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

In recent years, state-of-the-art methods in computer vision have utiliz...

0 Lechao Xiao, et al. ∙

research

∙ 03/31/2018

Learning Unsupervised Learning Rules

A major goal of unsupervised learning is to discover data representation...

0 Luke Metz, et al. ∙

research

∙ 02/23/2018

Sensitivity and Generalization in Neural Networks: an Empirical Study

In practice it is often found that large over-parameterized neural netwo...

0 Roman Novak, et al. ∙

research

∙ 02/22/2018

Adversarial Examples that Fool both Human and Computer Vision

Machine learning models are vulnerable to adversarial examples: small ch...

0 Gamaleldin F. Elsayed, et al. ∙

research

∙ 11/25/2017

Generalizing Hamiltonian Monte Carlo with Neural Networks

We present a general-purpose method to train Markov chain Monte Carlo ke...

0 Daniel Lévy, et al. ∙

research

∙ 11/01/2017

Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its para...

0 Jaehoon Lee, et al. ∙

research

∙ 10/18/2017

A Correspondence Between Random Neural Networks and Statistical Field Theory

A number of recent papers have provided evidence that practical design q...

0 Samuel S. Schoenholz, et al. ∙

research

∙ 06/19/2017

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

We propose a new technique, Singular Vector Canonical Correlation Analys...

0 Jojo Yun, et al. ∙

Jascha Sohl-Dickstein

Featured Co-authors

Sign in with Google

Consider DeepAI Pro