ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

06/04/2021
by   Felix Dangel, et al.
0

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. As examples for ViViT's usefulness, we investigate the directional gradients and curvatures during training, and how noise information can be used to improve the stability of second-order methods.

READ FULL TEXT

page 7

page 18

research
09/15/2020

Second-order Neural Network Training Using Complex-step Directional Derivative

While the superior performance of second-order optimization methods such...
research
12/23/2019

BackPACK: Packing more into backprop

Automatic differentiation frameworks are optimized for exactly one thing...
research
02/19/2018

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

We propose a block-diagonal approximation of the positive-curvature Hess...
research
05/01/2023

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method tha...
research
06/03/2020

On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

Following early work on Hessian-free methods for deep learning, we study...
research
11/05/2016

Loss-aware Binarization of Deep Networks

Deep neural network models, though very powerful and highly successful, ...
research
06/30/2021

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

The Hessian of a neural network captures parameter interactions through ...

Please sign up or login with your details

Forgot password? Click here to reset