Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

11/22/2016
by   Levent Sagun, et al.
0

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Negative eigenvalues of the Hessian in deep neural networks

The loss function of deep networks is known to be non-convex but the pre...
research
12/14/2020

A spectral characterization and an approximation scheme for the Hessian eigenvalue

We revisit the k-Hessian eigenvalue problem on a smooth, bounded, (k-1)-...
research
01/31/2022

On the Power-Law Spectrum in Deep Learning: A Bridge to Protein Science

It is well-known that the Hessian matters to optimization, generalizatio...
research
06/16/2020

Flatness is a False Friend

Hessian based measures of flatness, such as the trace, Frobenius and spe...
research
11/19/2015

Universal halting times in optimization and machine learning

The authors present empirical distributions for the halting time (measur...
research
02/17/2023

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

The Sharpness Aware Minimization (SAM) optimization algorithm has been s...
research
03/02/2021

Hessian Eigenspectra of More Realistic Nonlinear Models

Given an optimization problem, the Hessian matrix and its eigenspectrum ...

Please sign up or login with your details

Forgot password? Click here to reset