Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

09/24/2018
by   Wojciech Tarnowski, et al.
0

We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2020

A Survey on Activation Functions and their relation with Xavier and He Normal Initialization

In artificial neural network, the activation function and the weight ini...
research
05/26/2023

Universal approximation with complex-valued deep narrow neural networks

We study the universality of complex-valued neural networks with bounded...
research
07/31/2018

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...
research
05/26/2019

ProbAct: A Probabilistic Activation Function for Deep Neural Networks

Activation functions play an important role in the training of artificia...
research
06/25/2018

Propagating Uncertainty through the tanh Function with Application to Reservoir Computing

Many neural networks use the tanh activation function, however when give...
research
02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...
research
07/31/2022

Functional Rule Extraction Method for Artificial Neural Networks

The idea I propose in this paper is a method that is based on comprehens...

Please sign up or login with your details

Forgot password? Click here to reset