WoodFisher: Efficient second-order approximations for model compression

04/29/2020
by   Sidak Pal Singh, et al.
0

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been a tremendous amount of work on utilizing this information for the current compute and memory-intensive deep neural networks, usually via coarse-grained approximations (such as diagonal, blockwise, or Kronecker-factorization). However, not much is known about the quality of these approximations. Our work addresses this question, and in particular, we propose a method called `WoodFisher' that leverages the structure of the empirical Fisher information matrix, along with the Woodbury matrix identity, to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to the task of compressing neural networks, where we build on the classical Optimal Brain Damage/Surgeon framework (LeCun et al., 1990; Hassibi and Stork, 1993). We demonstrate that WoodFisher significantly outperforms magnitude pruning (isotropic Hessian), as well as methods that maintain other diagonal estimates. Further, even when gradual pruning is considered, our method results in a gain in test accuracy over the state-of-the-art approaches, for standard image classification datasets such as CIFAR-10, ImageNet. We also propose a variant called `WoodTaylor', which takes into account the first-order gradient term, and can lead to additional improvements. An important advantage of our methods is that they allow us to automatically set the layer-wise pruning thresholds, avoiding the need for any manual tuning or sensitivity analysis.

READ FULL TEXT

page 26

page 28

page 29

research
07/07/2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...
research
10/19/2021

SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

Pruning neural networks reduces inference time and memory costs. On stan...
research
02/12/2021

Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks

Second-order methods have the capability of accelerating optimization by...
research
01/22/2021

Hessian-Aware Pruning and Optimal Neural Implant

Pruning is an effective method to reduce the memory footprint and FLOPs ...
research
02/01/2019

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

Current methods to interpret deep learning models by generating saliency...
research
05/21/2018

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

We propose a fast second-order method that can be used as a drop-in repl...
research
10/18/2019

First-Order Preconditioning via Hypergradient Descent

Standard gradient descent methods are susceptible to a range of issues t...

Please sign up or login with your details

Forgot password? Click here to reset