A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

12/07/2020
by   Adepu Ravi Sankar, et al.
9

Loss landscape analysis is extremely useful for a deeper understanding of the generalization ability of deep neural network models. In this work, we propose a layerwise loss landscape analysis where the loss surface at every layer is studied independently and also on how each correlates to the overall loss surface. We study the layerwise loss landscape by studying the eigenspectra of the Hessian at each layer. In particular, our results show that the layerwise Hessian geometry is largely similar to the entire Hessian. We also report an interesting phenomenon where the Hessian eigenspectrum of middle layers of the deep neural network are observed to most similar to the overall Hessian eigenspectrum. We also show that the maximum eigenvalue and the trace of the Hessian (both full network and layerwise) reduce as training of the network progresses. We leverage on these observations to propose a new regularizer based on the trace of the layerwise Hessian. Penalizing the trace of the Hessian at every layer indirectly forces Stochastic Gradient Descent to converge to flatter minima, which are shown to have better generalization performance. In particular, we show that such a layerwise regularizer can be leveraged to penalize the middlemost layers alone, which yields promising results. Our empirical studies on well-known deep nets across datasets support the claims of this work

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2019

PyHessian: Neural Networks Through the Lens of the Hessian

We present PyHessian, a new scalable framework that enables fast computa...
research
10/01/2019

How noise affects the Hessian spectrum in overparameterized neural networks

Stochastic gradient descent (SGD) forms the core optimization method for...
research
08/11/2022

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

In this paper we develop a novel regularization method for deep neural n...
research
06/22/2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Recent works on over-parameterized neural networks have shown that the s...
research
05/29/2023

SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters

Modern neural networks are undeniably successful. Numerous studies have ...
research
09/19/2018

Identifying Generalization Properties in Neural Networks

While it has not yet been proven, empirical evidence suggests that model...
research
08/28/2022

Visualizing high-dimensional loss landscapes with Hessian directions

Analyzing geometric properties of high-dimensional loss functions, such ...

Please sign up or login with your details

Forgot password? Click here to reset