The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size

11/16/2018
by   Vardan Papyan, et al.
0

Previous works observed the spectrum of the Hessian of the training loss of deep neural networks. However, the networks considered were of minuscule size. We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate the spectrum of the Hessian of deep nets with tens of millions of parameters. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits 'spiked' behavior, with several outliers isolated from a continuous bulk. However we find that the bulk does not follow a simple Marchenko-Pastur distribution, as previously suggested, but rather a heavier-tailed distribution. Finally, we document the dynamics of the outliers and the bulk with varying sample size.

READ FULL TEXT
research
01/24/2019

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

We consider deep classifying neural networks. We expose a structure in t...
research
01/29/2019

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

To understand the dynamics of optimization in deep neural networks, we d...
research
10/01/2019

The asymptotic spectrum of the Hessian of DNN throughout training

The dynamics of DNNs during gradient descent is described by the so-call...
research
02/22/2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Large batch size training of Neural Networks has been shown to incur acc...
research
01/31/2022

On the Power-Law Spectrum in Deep Learning: A Bridge to Protein Science

It is well-known that the Hessian matters to optimization, generalizatio...
research
07/24/2019

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

While stochastic gradient descent (SGD) and variants have been surprisin...
research
06/14/2021

Robust Inference for High-Dimensional Linear Models via Residual Randomization

We propose a residual randomization procedure designed for robust Lasso-...

Please sign up or login with your details

Forgot password? Click here to reset