Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

01/24/2019
by   Vardan Papyan, et al.
0

We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.

READ FULL TEXT
research
11/16/2018

The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size

Previous works observed the spectrum of the Hessian of the training loss...
research
10/01/2019

The asymptotic spectrum of the Hessian of DNN throughout training

The dynamics of DNNs during gradient descent is described by the so-call...
research
09/04/2019

Theory of high-dimensional outliers

This study concerns the issue of high dimensional outliers which are cha...
research
05/29/2019

Spiked separable covariance matrices and principal components

We introduce a class of separable sample covariance matrices of the form...
research
03/04/2015

Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers

A large dimensional characterization of robust M-estimators of covarianc...
research
12/02/2019

On the Delta Method for Uncertainty Approximation in Deep Learning

The Delta method is a well known procedure used to quantify uncertainty ...
research
01/23/2019

High-dimensional Interactions Detection with Sparse Principal Hessian Matrix

In statistical methods, interactions are the contributions from the prod...

Please sign up or login with your details

Forgot password? Click here to reset