
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
Natural gradient descent has proven effective at mitigating the effects ...
05/27/2019 ∙ by Guodong Zhang, et al. ∙ 35 ∙ shareread it

Functional Variational Bayesian Neural Networks
Variational Bayesian neural networks (BNNs) perform variational inferenc...
03/14/2019 ∙ by Shengyang Sun, et al. ∙ 34 ∙ shareread it

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse
Posterior collapse in Variational Autoencoders (VAEs) arises when the va...
11/06/2019 ∙ by James Lucas, et al. ∙ 33 ∙ shareread it

SelfTuning Networks: Bilevel Optimization of Hyperparameters using Structured BestResponse Functions
Hyperparameter optimization can be formulated as a bilevel optimization ...
03/07/2019 ∙ by Matthew MacKay, et al. ∙ 26 ∙ shareread it

EigenDamage: Structured Pruning in the KroneckerFactored Eigenbasis
Reducing the test time resource requirements of a neural network while p...
05/15/2019 ∙ by Chaoqi Wang, et al. ∙ 10 ∙ shareread it

Sorting out Lipschitz function approximation
Training neural networks subject to a Lipschitz constraint is useful for...
11/13/2018 ∙ by Cem Anil, et al. ∙ 8 ∙ shareread it

A CoordinateFree Construction of Scalable Natural Gradient
Most neural networks are trained using firstorder optimization methods,...
08/30/2018 ∙ by Kevin Luk, et al. ∙ 6 ∙ shareread it

Reversible Recurrent Neural Networks
Recurrent neural networks (RNNs) provide stateoftheart performance in...
10/25/2018 ∙ by Matthew MacKay, et al. ∙ 6 ∙ shareread it

Adversarial Distillation of Bayesian Neural Network Posteriors
Bayesian neural networks (BNNs) allow us to reason about uncertainty in ...
06/27/2018 ∙ by KuanChieh Wang, et al. ∙ 4 ∙ shareread it

Eigenvalue Corrected Noisy Natural Gradient
Variational Bayesian neural networks combine the flexibility of deep lea...
11/30/2018 ∙ by Juhan Bae, et al. ∙ 4 ∙ shareread it

Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks
Lipschitz constraints under L2 norm on deep neural networks are useful f...
11/03/2019 ∙ by Qiyang Li, et al. ∙ 4 ∙ shareread it

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Increasing the batch size is a popular way to speed up neural network tr...
07/09/2019 ∙ by Guodong Zhang, et al. ∙ 3 ∙ shareread it

Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on th...
06/12/2018 ∙ by Shengyang Sun, et al. ∙ 2 ∙ shareread it

Three Mechanisms of Weight Decay Regularization
Weight decay is one of the standard tricks in the neural network toolbox...
10/29/2018 ∙ by Guodong Zhang, et al. ∙ 2 ∙ shareread it

Learning WakeSleep Recurrent Attention Models
Despite their success, convolutional neural networks are computationally...
09/22/2015 ∙ by Jimmy Ba, et al. ∙ 0 ∙ shareread it

On the Quantitative Analysis of DecoderBased Generative Models
The past several years have seen remarkable progress in generative model...
11/14/2016 ∙ by Yuhuai Wu, et al. ∙ 0 ∙ shareread it

Scalable trustregion method for deep reinforcement learning using Kroneckerfactored approximation
In this work, we propose to apply trust region optimization to deep rein...
08/17/2017 ∙ by Yuhuai Wu, et al. ∙ 0 ∙ shareread it

A Kroneckerfactored approximate Fisher matrix for convolution layers
Secondorder optimization methods such as natural gradient descent have ...
02/03/2016 ∙ by Roger Grosse, et al. ∙ 0 ∙ shareread it

Statistical Inference, Learning and Models in Big Data
The need for new methods to deal with big data is a common theme in most...
09/09/2015 ∙ by Beate Franke, et al. ∙ 0 ∙ shareread it

Importance Weighted Autoencoders
The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently ...
09/01/2015 ∙ by Yuri Burda, et al. ∙ 0 ∙ shareread it

Optimizing Neural Networks with Kroneckerfactored Approximate Curvature
We propose an efficient method for approximating natural gradient descen...
03/19/2015 ∙ by James Martens, et al. ∙ 0 ∙ shareread it

Automatic Construction and NaturalLanguage Description of Nonparametric Regression Models
This paper presents the beginnings of an automatic statistician, focusin...
02/18/2014 ∙ by James Robert Lloyd, et al. ∙ 0 ∙ shareread it

Structure Discovery in Nonparametric Regression through Compositional Kernel Search
Despite its importance, choosing the structural form of the kernel in no...
02/20/2013 ∙ by David Duvenaud, et al. ∙ 0 ∙ shareread it

Exploiting compositionality to explore a large space of model structures
The recent proliferation of richly structured probabilistic models raise...
10/16/2012 ∙ by Roger Grosse, et al. ∙ 0 ∙ shareread it

ShiftInvariance Sparse Coding for Audio Classification
Sparse coding is an unsupervised learning algorithm that learns a succin...
06/20/2012 ∙ by Roger Grosse, et al. ∙ 0 ∙ shareread it

Noisy Natural Gradient as Variational Inference
Combining the flexibility of deep learning with Bayesian uncertainty est...
12/06/2017 ∙ by Guodong Zhang, et al. ∙ 0 ∙ shareread it

Isolating Sources of Disentanglement in Variational Autoencoders
We decompose the evidence lower bound to show the existence of a term me...
02/14/2018 ∙ by Tian Qi Chen, et al. ∙ 0 ∙ shareread it

Understanding ShortHorizon Bias in Stochastic MetaOptimization
Careful tuning of the learning rate, or even schedules thereof, can be c...
03/06/2018 ∙ by Yuhuai Wu, et al. ∙ 0 ∙ shareread it

Flipout: Efficient PseudoIndependent Weight Perturbations on MiniBatches
Stochastic neural net weights are used in a variety of contexts, includi...
03/12/2018 ∙ by Yeming Wen, et al. ∙ 0 ∙ shareread it

Aggregated Momentum: Stability Through Passive Damping
Momentum is a simple and widely used trick which allows gradientbased o...
04/01/2018 ∙ by James Lucas, et al. ∙ 0 ∙ shareread it
Roger Grosse
is this you? claim profile
Assistant Professor of Computer Science at the University of Toronto, focusing on machine learning.