DeepAI AI Chat
Log In Sign Up

The Emergence of Spectral Universality in Deep Networks

by   Jeffrey Pennington, et al.

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.


page 1

page 2

page 3

page 4


Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

The Fisher information matrix (FIM) is fundamental for understanding the...

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

We demonstrate that in residual neural networks (ResNets) dynamical isom...

Beyond Random Matrix Theory for Deep Networks

We investigate whether the Wigner semi-circle and Marcenko-Pastur distri...

The Lanczos Algorithm Under Few Iterations: Concentration and Location of the Ritz Values

We study the Lanczos algorithm where the initial vector is sampled unifo...