Correlation Functions in Random Fully Connected Neural Networks at Finite Width
This article considers fully connected neural networks with Gaussian random weights and biases and L hidden layers, each of width proportional to a large parameter n. For polynomially bounded non-linearities we give sharp estimates in powers of 1/n for the joint correlation functions of the network output and its derivatives. Moreover, we obtain exact layerwise recursions for these correlation functions and solve a number of special cases for classes of non-linearities including ReLU and tanh. We find in both cases that the depth-to-width ratio L/n plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations. We use this to study a somewhat simplified version of the so-called exploding and vanishing gradient problem, proving that this particular variant occurs if and only if L/n is large. Several of the key ideas in this article were first developed at a physics level of rigor in a recent monograph with Roberts and Yaida.
READ FULL TEXT