DeepAI AI Chat
Log In Sign Up

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

by   Wojciech Tarnowski, et al.
Jagiellonian University

We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization.


page 1

page 2

page 3

page 4


A Survey on Activation Functions and their relation with Xavier and He Normal Initialization

In artificial neural network, the activation function and the weight ini...

Universal approximation with complex-valued deep narrow neural networks

We study the universality of complex-valued neural networks with bounded...

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...

ProbAct: A Probabilistic Activation Function for Deep Neural Networks

Activation functions play an important role in the training of artificia...

Propagating Uncertainty through the tanh Function with Application to Reservoir Computing

Many neural networks use the tanh activation function, however when give...

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

The logit outputs of a feedforward neural network at initialization are ...

Functional Rule Extraction Method for Artificial Neural Networks

The idea I propose in this paper is a method that is based on comprehens...