PyHessian: Neural Networks Through the Lens of the Hessian

12/16/2019
by   Zhewei Yao, et al.
0

We present PyHessian, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. This framework is developed in Pytorch, and it enables distributed-memory execution on cloud or supercomputer systems. PyHessian enables fast computations of the top Hessian eigenvalue, the Hessian trace, and the full Hessian eigenvalue density. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we apply PyHessian to analyze the effect of residual connections and Batch Normalization layers on the smoothness of the loss landscape. One recent claim, based on simpler first-order analysis, is that residual connections and batch normalization make the loss landscape “smoother”, thus making it easier for Stochastic Gradient Descent to converge to a good solution. We perform an extensive analysis by measuring directly the Hessian spectrum using PyHessian. This analysis leads to finer-scale insight, demonstrating that while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that batch normalization layers do not necessarily make the loss landscape smoother, especially for shallow networks. Instead, the claimed smoother loss landscape only becomes evident for deep neural networks. We perform extensive experiments on four residual networks (ResNet20/32/38/56) on Cifar-10/100 dataset. We have open-sourced our PyHessian framework for Hessian spectrum computation.

READ FULL TEXT

page 2

page 26

page 27

page 28

page 29

page 30

page 31

page 32

research
12/07/2020

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

Loss landscape analysis is extremely useful for a deeper understanding o...
research
10/01/2019

How noise affects the Hessian spectrum in overparameterized neural networks

Stochastic gradient descent (SGD) forms the core optimization method for...
research
02/22/2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Large batch size training of Neural Networks has been shown to incur acc...
research
01/29/2019

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

To understand the dynamics of optimization in deep neural networks, we d...
research
10/01/2019

The asymptotic spectrum of the Hessian of DNN throughout training

The dynamics of DNNs during gradient descent is described by the so-call...
research
05/21/2021

Variational Quantum Classifiers Through the Lens of the Hessian

In quantum computing, the variational quantum algorithms (VQAs) are well...
research
09/13/2016

Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

We present a theoretically grounded approach to train deep neural networ...

Please sign up or login with your details

Forgot password? Click here to reset