The Emergence of Spectral Universality in Deep Networks

02/27/2018
by   Jeffrey Pennington, et al.
0

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...
research
07/31/2018

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...
research
04/13/2020

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...
research
06/14/2020

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

The Fisher information matrix (FIM) is fundamental for understanding the...
research
09/24/2018

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

We demonstrate that in residual neural networks (ResNets) dynamical isom...
research
06/13/2020

Beyond Random Matrix Theory for Deep Networks

We investigate whether the Wigner semi-circle and Marcenko-Pastur distri...
research
04/12/2019

The Lanczos Algorithm Under Few Iterations: Concentration and Location of the Ritz Values

We study the Lanczos algorithm where the initial vector is sampled unifo...

Please sign up or login with your details

Forgot password? Click here to reset