Roger Grosse

research

∙ 08/07/2023

Studying Large Language Model Generalization with Influence Functions

When trying to gain better visibility into a machine learning model in o...

0 Roger Grosse, et al. ∙

research

∙ 02/07/2023

Efficient Parametric Approximations of Neural Network Function Space Distance

It is often useful to compactly summarize important properties of model ...

20 Nikita Dhawan, et al. ∙

research

∙ 12/28/2022

On Implicit Bias in Overparameterized Bilevel Optimization

Many problems in machine learning involve bilevel optimization (BLO), in...

0 Paul Vicol, et al. ∙

research

∙ 12/07/2022

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Variational autoencoders (VAEs) are powerful tools for learning latent r...

19 Juhan Bae, et al. ∙

research

∙ 11/26/2022

Similarity-based Cooperation

As machine learning agents act more autonomously in the world, they will...

0 Caspar Oesterheld, et al. ∙

research

∙ 11/18/2022

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

Designing networks capable of attaining better performance with an incre...

0 Cem Anil, et al. ∙

research

∙ 09/21/2022

Toy Models of Superposition

Neural networks often pack many unrelated concepts into a single neuron ...

12 Nelson Elhage, et al. ∙

research

∙ 09/12/2022

If Influence Functions are the Answer, Then What is the Question?

Influence functions efficiently estimate the effect of removing a single...

2 Juhan Bae, et al. ∙

research

∙ 02/28/2022

Amortized Proximal Optimization

We propose a framework for online meta-optimization of parameters that g...

10 Juhan Bae, et al. ∙

research

∙ 08/27/2021

Learning to Give Checkable Answers with Prover-Verifier Games

Our ability to know when to trust the decisions made by machine learning...

0 Cem Anil, et al. ∙

research

∙ 07/21/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Annealed importance sampling (AIS) and related algorithms are highly eff...

14 Guodong Zhang, et al. ∙

research

∙ 06/10/2021

Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition

We introduce a new scalable variational Gaussian process approximation w...

0 Shengyang Sun, et al. ∙

research

∙ 04/22/2021

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Linear interpolation between initial neural network parameters and conve...

5 James Lucas, et al. ∙

research

∙ 02/18/2021

Don't Fix What ain't Broke: Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Minimax optimization has recently gained a lot of attention as adversari...

0 Guodong Zhang, et al. ∙

research

∙ 01/15/2021

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

While designing inductive bias in neural architectures has been widely s...

58 Yuhuai Wu, et al. ∙

research

∙ 11/06/2020

Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?

While uncertainty estimation is a well-studied topic in deep learning, m...

0 Chaoqi Wang, et al. ∙

research

∙ 10/26/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Hyperparameter optimization of neural networks can be elegantly formulat...

1 Juhan Bae, et al. ∙

research

∙ 09/23/2020

A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints

The theory of integral quadratic constraints (IQCs) allows the certifica...

0 Guodong Zhang, et al. ∙

research

∙ 08/15/2020

Evaluating Lossy Compression Rates of Deep Generative Models

The field of deep generative modeling has succeeded in producing astonis...

10 Sicong Huang, et al. ∙

research

∙ 07/13/2020

Regularized linear autoencoders recover the principal components, eventually

Our understanding of learning input-output relationships with neural net...

39 Xuchan Bao, et al. ∙

research

∙ 07/08/2020

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

In this work, we focus on an analogical reasoning task that contains ric...

1 Yuhuai Wu, et al. ∙

research

∙ 07/07/2020

Learning Branching Heuristics for Propositional Model Counting

Propositional model counting or #SAT is the problem of computing the num...

8 Pashootan Vaezipoor, et al. ∙

research

∙ 07/06/2020

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

In learning-assisted theorem proving, one of the most critical challenge...

0 Yuhuai Wu, et al. ∙

research

∙ 06/18/2020

When Does Preconditioning Help or Hurt Generalization?

While second order optimizers such as natural gradient descent (NGD) oft...

0 Shun-ichi Amari, et al. ∙

research

∙ 06/16/2020

Understanding and mitigating exploding inverses in invertible neural networks

Invertible neural networks (INNs) have been used to design generative mo...

14 Jens Behrmann, et al. ∙

research

∙ 02/18/2020

Picking Winning Tickets Before Training by Preserving Gradient Flow

Overparameterization has been shown to benefit both the optimization and...

2 Chaoqi Wang, et al. ∙

research

∙ 11/06/2019

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse

Posterior collapse in Variational Autoencoders (VAEs) arises when the va...

33 James Lucas, et al. ∙

research

∙ 11/03/2019

Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks

Lipschitz constraints under L2 norm on deep neural networks are useful f...

4 Qiyang Li, et al. ∙

research

∙ 07/09/2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network tr...

3 Guodong Zhang, et al. ∙

research

∙ 05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...

35 Guodong Zhang, et al. ∙

research

∙ 05/15/2019

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

Reducing the test time resource requirements of a neural network while p...

10 Chaoqi Wang, et al. ∙

research

∙ 03/14/2019

Functional Variational Bayesian Neural Networks

Variational Bayesian neural networks (BNNs) perform variational inferenc...

34 Shengyang Sun, et al. ∙

research

∙ 03/07/2019

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions

Hyperparameter optimization can be formulated as a bilevel optimization ...

26 Matthew MacKay, et al. ∙

research

∙ 11/30/2018

Eigenvalue Corrected Noisy Natural Gradient

Variational Bayesian neural networks combine the flexibility of deep lea...

5 Juhan Bae, et al. ∙

research

∙ 11/13/2018

Sorting out Lipschitz function approximation

Training neural networks subject to a Lipschitz constraint is useful for...

8 Cem Anil, et al. ∙

research

∙ 10/29/2018

Three Mechanisms of Weight Decay Regularization

Weight decay is one of the standard tricks in the neural network toolbox...

2 Guodong Zhang, et al. ∙

research

∙ 10/25/2018

Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in...

6 Matthew MacKay, et al. ∙

research

∙ 08/30/2018

A Coordinate-Free Construction of Scalable Natural Gradient

Most neural networks are trained using first-order optimization methods,...

6 Kevin Luk, et al. ∙

research

∙ 06/27/2018

Adversarial Distillation of Bayesian Neural Network Posteriors

Bayesian neural networks (BNNs) allow us to reason about uncertainty in ...

4 Kuan-Chieh Wang, et al. ∙

research

∙ 06/12/2018

Differentiable Compositional Kernel Learning for Gaussian Processes

The generalization properties of Gaussian processes depend heavily on th...

2 Shengyang Sun, et al. ∙

research

∙ 04/01/2018

Aggregated Momentum: Stability Through Passive Damping

Momentum is a simple and widely used trick which allows gradient-based o...

0 James Lucas, et al. ∙

research

∙ 03/12/2018

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Stochastic neural net weights are used in a variety of contexts, includi...

0 Yeming Wen, et al. ∙

research

∙ 03/06/2018

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

Careful tuning of the learning rate, or even schedules thereof, can be c...

0 Yuhuai Wu, et al. ∙

research

∙ 02/14/2018

Isolating Sources of Disentanglement in Variational Autoencoders

We decompose the evidence lower bound to show the existence of a term me...

0 Tian Qi Chen, et al. ∙

research

∙ 12/06/2017

Noisy Natural Gradient as Variational Inference

Combining the flexibility of deep learning with Bayesian uncertainty est...

0 Guodong Zhang, et al. ∙

research

∙ 08/17/2017

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

In this work, we propose to apply trust region optimization to deep rein...

0 Yuhuai Wu, et al. ∙

research

∙ 11/14/2016

On the Quantitative Analysis of Decoder-Based Generative Models

The past several years have seen remarkable progress in generative model...

0 Yuhuai Wu, et al. ∙

research

∙ 02/03/2016

A Kronecker-factored approximate Fisher matrix for convolution layers

Second-order optimization methods such as natural gradient descent have ...

0 Roger Grosse, et al. ∙

research

∙ 09/22/2015

Learning Wake-Sleep Recurrent Attention Models

Despite their success, convolutional neural networks are computationally...

0 Jimmy Ba, et al. ∙

research

∙ 09/09/2015

Statistical Inference, Learning and Models in Big Data

The need for new methods to deal with big data is a common theme in most...

0 Beate Franke, et al. ∙

Roger Grosse

Featured Co-authors

Sign in with Google

Consider DeepAI Pro