BackPACK: Packing more into backprop

12/23/2019
by   Felix Dangel, et al.
10

Automatic differentiation frameworks are optimized for exactly one thing: computing the average mini-batch gradient. Yet, other quantities such as the variance of the mini-batch gradients or many approximations to the Hessian can, in theory, be computed efficiently, and at the same time as the gradient. While these quantities are of great interest to researchers and practitioners, current deep-learning software does not support their automatic calculation. Manually implementing them is burdensome, inefficient if done naively, and the resulting code is rarely shared. This hampers progress in deep learning, and unnecessarily narrows research to focus on gradient descent and its variants; it also complicates replication studies and comparisons between newly developed methods that require those quantities, to the point of impossibility. To address this problem, we introduce BackPACK, an efficient framework built on top of PyTorch, that extends the backpropagation algorithm to extract additional information from first- and second-order derivatives. Its capabilities are illustrated by benchmark reports for computing additional quantities on deep neural networks, and an example application by testing several recent curvature approximations for optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) a...
research
07/09/2020

A Study of Gradient Variance in Deep Learning

The impact of gradient noise on training deep models is widely acknowled...
research
11/09/2020

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Standard first-order stochastic optimization algorithms base their updat...
research
07/14/2021

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous...
research
05/19/2016

A Multi-Batch L-BFGS Method for Machine Learning

The question of how to parallelize the stochastic gradient descent (SGD)...
research
01/01/2022

Fitting Matérn Smoothness Parameters Using Automatic Differentiation

The Matérn covariance function is ubiquitous in the application of Gauss...
research
08/29/2020

Efficient Computation of Expectations under Spanning Tree Distributions

We give a general framework for inference in spanning tree models. We pr...

Please sign up or login with your details

Forgot password? Click here to reset