The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size

11/16/2018
by   Vardan Papyan, et al.
0

Previous works observed the spectrum of the Hessian of the training loss of deep neural networks. However, the networks considered were of minuscule size. We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate the spectrum of the Hessian of deep nets with tens of millions of parameters. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits 'spiked' behavior, with several outliers isolated from a continuous bulk. However we find that the bulk does not follow a simple Marchenko-Pastur distribution, as previously suggested, but rather a heavier-tailed distribution. Finally, we document the dynamics of the outliers and the bulk with varying sample size.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset