Spectrum concentration in deep residual learning: a free probability appproach

07/31/2018
by   Zenan Ling, et al.
0

We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully- connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of O(1), when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...
research
09/24/2018

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

We demonstrate that in residual neural networks (ResNets) dynamical isom...
research
08/11/2019

Almost Surely Asymptotic Freeness for Jacobian Spectrum of Deep Network

Free probability theory helps us to understand Jacobian spectrum of deep...
research
02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...
research
06/14/2020

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

The Fisher information matrix (FIM) is fundamental for understanding the...
research
05/31/2019

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization

We regard pre-trained residual networks (ResNets) as nonlinear systems a...
research
07/07/2020

Doubly infinite residual networks: a diffusion process approach

When neural network's parameters are initialized as i.i.d., neural netwo...

Please sign up or login with your details

Forgot password? Click here to reset