
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving
In learningassisted theorem proving, one of the most critical challenge...
read it

When Does Preconditioning Help or Hurt Generalization?
While second order optimizers such as natural gradient descent (NGD) oft...
read it

Understanding and mitigating exploding inverses in invertible neural networks
Invertible neural networks (INNs) have been used to design generative mo...
read it

Picking Winning Tickets Before Training by Preserving Gradient Flow
Overparameterization has been shown to benefit both the optimization and...
read it

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse
Posterior collapse in Variational Autoencoders (VAEs) arises when the va...
read it

Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks
Lipschitz constraints under L2 norm on deep neural networks are useful f...
read it

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Increasing the batch size is a popular way to speed up neural network tr...
read it

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
Natural gradient descent has proven effective at mitigating the effects ...
read it

EigenDamage: Structured Pruning in the KroneckerFactored Eigenbasis
Reducing the test time resource requirements of a neural network while p...
read it

Functional Variational Bayesian Neural Networks
Variational Bayesian neural networks (BNNs) perform variational inferenc...
read it

SelfTuning Networks: Bilevel Optimization of Hyperparameters using Structured BestResponse Functions
Hyperparameter optimization can be formulated as a bilevel optimization ...
read it

Eigenvalue Corrected Noisy Natural Gradient
Variational Bayesian neural networks combine the flexibility of deep lea...
read it

Sorting out Lipschitz function approximation
Training neural networks subject to a Lipschitz constraint is useful for...
read it

Three Mechanisms of Weight Decay Regularization
Weight decay is one of the standard tricks in the neural network toolbox...
read it

Reversible Recurrent Neural Networks
Recurrent neural networks (RNNs) provide stateoftheart performance in...
read it

A CoordinateFree Construction of Scalable Natural Gradient
Most neural networks are trained using firstorder optimization methods,...
read it

Adversarial Distillation of Bayesian Neural Network Posteriors
Bayesian neural networks (BNNs) allow us to reason about uncertainty in ...
read it

Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on th...
read it

Aggregated Momentum: Stability Through Passive Damping
Momentum is a simple and widely used trick which allows gradientbased o...
read it

Flipout: Efficient PseudoIndependent Weight Perturbations on MiniBatches
Stochastic neural net weights are used in a variety of contexts, includi...
read it

Understanding ShortHorizon Bias in Stochastic MetaOptimization
Careful tuning of the learning rate, or even schedules thereof, can be c...
read it

Isolating Sources of Disentanglement in Variational Autoencoders
We decompose the evidence lower bound to show the existence of a term me...
read it

Noisy Natural Gradient as Variational Inference
Combining the flexibility of deep learning with Bayesian uncertainty est...
read it

Scalable trustregion method for deep reinforcement learning using Kroneckerfactored approximation
In this work, we propose to apply trust region optimization to deep rein...
read it

On the Quantitative Analysis of DecoderBased Generative Models
The past several years have seen remarkable progress in generative model...
read it

A Kroneckerfactored approximate Fisher matrix for convolution layers
Secondorder optimization methods such as natural gradient descent have ...
read it

Learning WakeSleep Recurrent Attention Models
Despite their success, convolutional neural networks are computationally...
read it

Statistical Inference, Learning and Models in Big Data
The need for new methods to deal with big data is a common theme in most...
read it

Importance Weighted Autoencoders
The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently ...
read it

Optimizing Neural Networks with Kroneckerfactored Approximate Curvature
We propose an efficient method for approximating natural gradient descen...
read it

Automatic Construction and NaturalLanguage Description of Nonparametric Regression Models
This paper presents the beginnings of an automatic statistician, focusin...
read it

Structure Discovery in Nonparametric Regression through Compositional Kernel Search
Despite its importance, choosing the structural form of the kernel in no...
read it

Exploiting compositionality to explore a large space of model structures
The recent proliferation of richly structured probabilistic models raise...
read it

ShiftInvariance Sparse Coding for Audio Classification
Sparse coding is an unsupervised learning algorithm that learns a succin...
read it
Roger Grosse
is this you? claim profile
Assistant Professor of Computer Science at the University of Toronto, focusing on machine learning.