Beyond Random Matrix Theory for Deep Networks

06/13/2020
by   Diego Granziol, et al.
0

We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at 0 depends on the ratio of the dimensions of the weight matrices.

READ FULL TEXT
research
04/25/2017

Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices

In this work a novel method to quantify spectral ergodicity for random m...
research
12/01/2020

The Perturbative Resolvent Method: spectral densities of random matrix ensembles via perturbation theory

We present a simple, perturbative approach for calculating spectral dens...
research
08/27/2020

Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra

Numerous researchers recently applied empirical spectral analysis to the...
research
02/12/2021

Applicability of Random Matrix Theory in Deep Learning

We investigate the local spectral statistics of the loss surface Hessian...
research
01/21/2018

Limiting Distributions of Spectral Radii for Product of Matrices from the Spherical Ensemble

Consider the product of m independent n× n random matrices from the sphe...
research
11/26/2021

Implicit Data-Driven Regularization in Deep Neural Networks under SGD

Much research effort has been devoted to explaining the success of deep ...
research
02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...

Please sign up or login with your details

Forgot password? Click here to reset