
Training Learned Optimizers with Randomly Initialized Learned Optimizers
Learned optimizers are increasingly effective, with performance exceedin...
read it

Parallel Training of Deep Networks with Local Updates
Deep learning models trained on large data sets have been widely success...
read it

Towards NNGPguided Neural Architecture Search
The predictions of wide Bayesian neural networks are described by a Gaus...
read it

Reverse engineering learned optimizers reveals known and novel mechanisms
Learned optimizers are algorithms that can themselves be trained to solv...
read it

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence
We perform an extensive empirical study of the statistical properties of...
read it

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
Much as replacing handdesigned features with learned functions has revo...
read it

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible
Machine learning is predicated on the concept of generalization: a model...
read it

Finite Versus Infinite Neural Networks: an Empirical Study
We perform a careful, thorough, and large scale empirical study of the c...
read it

A new method for parameter estimation in probabilistic models: Minimum probability flow
Fitting probabilistic models to data is often difficult, due to the gene...
read it

Exact posterior distributions of wide Bayesian neural networks
Recent work has shown that the prior over functions induced by a deep Ba...
read it

Infinite attention: NNGP and NTK for deep attention networks
There is a growing amount of literature on the relationship between wide...
read it

Two equalities expressing the determinant of a matrix in terms of expectations over matrixvector products
We introduce two equations expressing the inverse determinant of a full ...
read it

Your GAN is Secretly an Energybased Model and You Should use Discriminator Driven Latent Sampling
We show that the sum of the implicit generator logdensity log p_g of a ...
read it

The large learning rate phase of deep learning: the catapult mechanism
The choice of initial learning rate can have a profound effect on the pe...
read it

Using a thousand optimization tasks to learn hyperparameter search strategies
We present TaskSet, a dataset of tasks for use in training and evaluatin...
read it

On the infinite width limit of neural networks with a standard parameterization
There are currently two parameterizations used to derive fixed kernels c...
read it

Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Neural Tangents is a library designed to enable research into infinitew...
read it

Neural reparameterization improves structural optimization
Structural optimization is a popular method for designing objects such a...
read it

Using learned optimizers to make models robust to input noise
Stateofthe art vision models can achieve superhuman performance on ima...
read it

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
We investigate how the final parameters found by stochastic gradient des...
read it

A RAD approach to deep mixture models
Flow based models such as Real NVP are an extremely powerful approach to...
read it

A Mean Field Theory of Batch Normalization
We develop a mean field theory for batch normalization in fullyconnecte...
read it

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
A longstanding goal in deep learning research has been to precisely char...
read it

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit
Recent work has noted that all bad local minima can be removed from neur...
read it

Measuring the Effects of Data Parallelism on Neural Network Training
Recent hardware developments have made unprecedented amounts of data par...
read it

Learned optimizers that outperform SGD on wallclock and test loss
Deep learning has shown that learned functions can dramatically outperfo...
read it

Learned optimizers that outperform SGD on wallclock and validation loss
Deep learning has shown that learned functions can dramatically outperfo...
read it

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connecte...
read it

Adversarial Reprogramming of Neural Networks
Deep neural networks are susceptible to adversarial attacks. In computer...
read it

Guided evolutionary strategies: escaping the curse of dimensionality in random search
Many applications in machine learning require optimizing a function whos...
read it

Stochastic natural gradient descent draws posterior samples in function space
Natural gradient descent (NGD) minimises the cost function on a Riemanni...
read it

PCA of high dimensional random walks with comparison to neural network training
One technique to visualize the training of neural networks is to perform...
read it

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000Layer Vanilla Convolutional Neural Networks
In recent years, stateoftheart methods in computer vision have utiliz...
read it

Learning Unsupervised Learning Rules
A major goal of unsupervised learning is to discover data representation...
read it

Sensitivity and Generalization in Neural Networks: an Empirical Study
In practice it is often found that large overparameterized neural netwo...
read it

Adversarial Examples that Fool both Human and Computer Vision
Machine learning models are vulnerable to adversarial examples: small ch...
read it

Generalizing Hamiltonian Monte Carlo with Neural Networks
We present a generalpurpose method to train Markov chain Monte Carlo ke...
read it

Deep Neural Networks as Gaussian Processes
A deep fullyconnected neural network with an i.i.d. prior over its para...
read it

A Correspondence Between Random Neural Networks and Statistical Field Theory
A number of recent papers have provided evidence that practical design q...
read it

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
We propose a new technique, Singular Vector Canonical Correlation Analys...
read it

REBAR: Lowvariance, unbiased gradient estimates for discrete latent variable models
Learning in models with discrete latent variables is challenging due to ...
read it

Learned Optimizers that Scale and Generalize
Learning to learn has emerged as an important direction for achieving ar...
read it

Capacity and Trainability in Recurrent Neural Networks
Two potential bottlenecks on the expressiveness of recurrent neural netw...
read it

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability
There exist many problem domains where the interpretability of neural ne...
read it

Survey of Expressivity in Deep Neural Networks
We survey results on neural network expressivity described in "On the Ex...
read it

Unrolled Generative Adversarial Networks
We introduce a method to stabilize Generative Adversarial Networks (GANs...
read it

Deep Information Propagation
We study the behavior of untrained neural networks whose weights and bia...
read it

Exponential expressivity in deep neural networks through transient chaos
We combine Riemannian geometry with the mean field theory of high dimens...
read it

On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity,...
read it

Density estimation using Real NVP
Unsupervised learning of probabilistic models is a central yet challengi...
read it
Jascha SohlDickstein
is this you? claim profile
Staff Research Scientist in the Brain group at Google, Academic Resident at Khan Academy, Visiting scholar in Surya Ganguli's lab at Stanford University, PhD in 2012 in the Redwood Center for Theoretical Neuroscience at UC Berkeley, in Bruno Olshausen's lab.