DeepAI AI Chat
Log In Sign Up

The Emergence of Spectral Universality in Deep Networks

02/27/2018
by   Jeffrey Pennington, et al.
0

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...
07/31/2018

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...
04/13/2020

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...
06/14/2020

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

The Fisher information matrix (FIM) is fundamental for understanding the...
09/24/2018

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

We demonstrate that in residual neural networks (ResNets) dynamical isom...
06/13/2020

Beyond Random Matrix Theory for Deep Networks

We investigate whether the Wigner semi-circle and Marcenko-Pastur distri...
04/12/2019

The Lanczos Algorithm Under Few Iterations: Concentration and Location of the Ritz Values

We study the Lanczos algorithm where the initial vector is sampled unifo...