
-
Training Learned Optimizers with Randomly Initialized Learned Optimizers
Learned optimizers are increasingly effective, with performance exceedin...
read it
-
Parallel Training of Deep Networks with Local Updates
Deep learning models trained on large data sets have been widely success...
read it
-
Towards NNGP-guided Neural Architecture Search
The predictions of wide Bayesian neural networks are described by a Gaus...
read it
-
Reverse engineering learned optimizers reveals known and novel mechanisms
Learned optimizers are algorithms that can themselves be trained to solv...
read it
-
Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence
We perform an extensive empirical study of the statistical properties of...
read it
-
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
Much as replacing hand-designed features with learned functions has revo...
read it
-
Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible
Machine learning is predicated on the concept of generalization: a model...
read it
-
Finite Versus Infinite Neural Networks: an Empirical Study
We perform a careful, thorough, and large scale empirical study of the c...
read it
-
A new method for parameter estimation in probabilistic models: Minimum probability flow
Fitting probabilistic models to data is often difficult, due to the gene...
read it
-
Exact posterior distributions of wide Bayesian neural networks
Recent work has shown that the prior over functions induced by a deep Ba...
read it
-
Infinite attention: NNGP and NTK for deep attention networks
There is a growing amount of literature on the relationship between wide...
read it
-
Two equalities expressing the determinant of a matrix in terms of expectations over matrix-vector products
We introduce two equations expressing the inverse determinant of a full ...
read it
-
Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling
We show that the sum of the implicit generator log-density log p_g of a ...
read it
-
The large learning rate phase of deep learning: the catapult mechanism
The choice of initial learning rate can have a profound effect on the pe...
read it
-
Using a thousand optimization tasks to learn hyperparameter search strategies
We present TaskSet, a dataset of tasks for use in training and evaluatin...
read it
-
On the infinite width limit of neural networks with a standard parameterization
There are currently two parameterizations used to derive fixed kernels c...
read it
-
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Neural Tangents is a library designed to enable research into infinite-w...
read it
-
Neural reparameterization improves structural optimization
Structural optimization is a popular method for designing objects such a...
read it
-
Using learned optimizers to make models robust to input noise
State-of-the art vision models can achieve superhuman performance on ima...
read it
-
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
We investigate how the final parameters found by stochastic gradient des...
read it
-
A RAD approach to deep mixture models
Flow based models such as Real NVP are an extremely powerful approach to...
read it
-
A Mean Field Theory of Batch Normalization
We develop a mean field theory for batch normalization in fully-connecte...
read it
-
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
A longstanding goal in deep learning research has been to precisely char...
read it
-
Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit
Recent work has noted that all bad local minima can be removed from neur...
read it
-
Measuring the Effects of Data Parallelism on Neural Network Training
Recent hardware developments have made unprecedented amounts of data par...
read it
-
Learned optimizers that outperform SGD on wall-clock and test loss
Deep learning has shown that learned functions can dramatically outperfo...
read it
-
Learned optimizers that outperform SGD on wall-clock and validation loss
Deep learning has shown that learned functions can dramatically outperfo...
read it
-
Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connecte...
read it
-
Adversarial Reprogramming of Neural Networks
Deep neural networks are susceptible to adversarial attacks. In computer...
read it
-
Guided evolutionary strategies: escaping the curse of dimensionality in random search
Many applications in machine learning require optimizing a function whos...
read it
-
Stochastic natural gradient descent draws posterior samples in function space
Natural gradient descent (NGD) minimises the cost function on a Riemanni...
read it
-
PCA of high dimensional random walks with comparison to neural network training
One technique to visualize the training of neural networks is to perform...
read it
-
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In recent years, state-of-the-art methods in computer vision have utiliz...
read it
-
Learning Unsupervised Learning Rules
A major goal of unsupervised learning is to discover data representation...
read it
-
Sensitivity and Generalization in Neural Networks: an Empirical Study
In practice it is often found that large over-parameterized neural netwo...
read it
-
Adversarial Examples that Fool both Human and Computer Vision
Machine learning models are vulnerable to adversarial examples: small ch...
read it
-
Generalizing Hamiltonian Monte Carlo with Neural Networks
We present a general-purpose method to train Markov chain Monte Carlo ke...
read it
-
Deep Neural Networks as Gaussian Processes
A deep fully-connected neural network with an i.i.d. prior over its para...
read it
-
A Correspondence Between Random Neural Networks and Statistical Field Theory
A number of recent papers have provided evidence that practical design q...
read it
-
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
We propose a new technique, Singular Vector Canonical Correlation Analys...
read it
-
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
Learning in models with discrete latent variables is challenging due to ...
read it
-
Learned Optimizers that Scale and Generalize
Learning to learn has emerged as an important direction for achieving ar...
read it
-
Capacity and Trainability in Recurrent Neural Networks
Two potential bottlenecks on the expressiveness of recurrent neural netw...
read it
-
Input Switched Affine Networks: An RNN Architecture Designed for Interpretability
There exist many problem domains where the interpretability of neural ne...
read it
-
Survey of Expressivity in Deep Neural Networks
We survey results on neural network expressivity described in "On the Ex...
read it
-
Unrolled Generative Adversarial Networks
We introduce a method to stabilize Generative Adversarial Networks (GANs...
read it
-
Deep Information Propagation
We study the behavior of untrained neural networks whose weights and bia...
read it
-
Exponential expressivity in deep neural networks through transient chaos
We combine Riemannian geometry with the mean field theory of high dimens...
read it
-
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity,...
read it
-
Density estimation using Real NVP
Unsupervised learning of probabilistic models is a central yet challengi...
read it