Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

01/11/2018
by   Boris Hanin, et al.
0

We give a rigorous analysis of the statistical behavior of gradients in randomly initialized feed-forward networks with ReLU activations. Our results show that a fully connected depth d ReLU net with hidden layer widths n_j will have exploding and vanishing gradients if and only if ∑_j=1^d-1 1/n_j is large. The point of view of this article is that whether a given neural net will have exploding/vanishing gradients is a function mainly of the architecture of the net, and hence can be tested at initialization. Our results imply that a fully connected network that produces manageable gradients at initialization must have many hidden layers that are about as wide as the network is deep. This work is related to the mean field theory approach to random neural nets. From this point of view, we give a rigorous computation of the 1/n_j corrections to the propagation of gradients at the so-called edge of chaos.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics ...
research
01/17/2023

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

We study the gradients of a maxout network with respect to inputs and pa...
research
12/14/2018

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

We study products of random matrices in the regime where the number of t...
research
12/24/2017

Mean Field Residual Networks: On the Edge of Chaos

We study randomly initialized residual networks using mean field theory ...
research
08/23/2023

Stabilizing RNN Gradients through Pre-training

Numerous theories of learning suggest to prevent the gradient variance f...
research
06/17/2018

Exact information propagation through fully-connected feed forward neural networks

Neural network ensembles at initialisation give rise to the trainability...
research
03/05/2018

How to Start Training: The Effect of Initialization and Architecture

We investigate the effects of initialization and architecture on the sta...

Please sign up or login with your details

Forgot password? Click here to reset