Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

by   Elias Frantar, et al.

Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension d, if the Hessian is given as a sum of m rank-one matrices, using O(dm^2) precomputation, O(dm) cost for computing the IHVP, and query cost O(m) for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost O(dm + m^2) for computing the IHVP and O(dm + m^3) for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [10] and [18].


page 17

page 21


WoodFisher: Efficient second-order approximations for model compression

Second-order information, in the form of Hessian- or Inverse-Hessian-vec...

Error Feedback Can Accurately Compress Preconditioners

Leveraging second-order information at the scale of deep networks is one...

Training (Overparametrized) Neural Networks in Near-Linear Time

The slow convergence rate and pathological curvature issues of first-ord...

SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

Pruning neural networks reduces inference time and memory costs. On stan...

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

In this paper we focus on the linear algebra theory behind feedforward (...

NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

In this paper, a novel second-order method called NG+ is proposed. By fo...

A Faster Interior-Point Method for Sum-of-Squares Optimization

We present a faster interior-point method for optimizing sum-of-squares ...

Please sign up or login with your details

Forgot password? Click here to reset