Products of Many Large Random Matrices and Gradients in Deep Neural Networks

12/14/2018
by   Boris Hanin, et al.
0

We study products of random matrices in the regime where the number of terms and the size of the matrices simultaneously tend to infinity. Our main theorem is that the logarithm of the ℓ_2 norm of such a product applied to any fixed vector is asymptotically Gaussian. The fluctuations we find can be thought of as a finite temperature correction to the limit in which first the size and then the number of matrices tend to infinity. Depending on the scaling limit considered, the mean and variance of the limiting Gaussian depend only on either the first two or the first four moments of the measure from which matrix entries are drawn. We also obtain explicit error bounds on the moments of the norm and the Kolmogorov-Smirnov distance to a Gaussian. Finally, we apply our result to obtain precise information about the stability of gradients in randomly initialized deep neural networks with ReLU activations. This provides a quantitative measure of the extent to which the exploding and vanishing gradient problem occurs in a fully connected neural network with ReLU activations and a given architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

Convergence of Deep ReLU Networks

We explore convergence of deep neural networks with the popular ReLU act...
research
01/11/2018

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

We give a rigorous analysis of the statistical behavior of gradients in ...
research
07/20/2020

Mixed Moments for the Product of Ginibre Matrices

We study the ensemble of a product of n complex Gaussian i.i.d. matrices...
research
09/28/2021

Convergence of Deep Convolutional Neural Networks

Convergence of deep neural networks as the depth of the networks tends t...
research
01/17/2023

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

We study the gradients of a maxout network with respect to inputs and pa...
research
09/22/2020

Tensor Programs III: Neural Matrix Laws

In a neural network (NN), weight matrices linearly transform inputs into...
research
03/14/2022

Quantitative Gaussian Approximation of Randomly Initialized Deep Neural Networks

Given any deep fully connected neural network, initialized with random G...

Please sign up or login with your details

Forgot password? Click here to reset