The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

06/06/2022
by   Mufan Bill Li, et al.
0

The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

READ FULL TEXT
research
06/30/2023

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

In deep learning theory, the covariance matrix of the representations se...
research
10/03/2022

On the infinite-depth limit of finite-width neural networks

In this paper, we study the infinite-depth limit of finite-width residua...
research
03/30/2023

Neural signature kernels as infinite-width-depth-limits of controlled ResNets

Motivated by the paradigm of reservoir computing, we consider randomly i...
research
04/06/2023

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Recent developments in applications of artificial neural networks with o...
research
09/16/2020

Universality Laws for High-Dimensional Learning with Random Features

We prove a universality theorem for learning with random features. Our r...
research
03/14/2017

Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

We show that in a deep neural network trained with ReLU, the low-lying l...
research
04/10/2023

Criticality versus uniformity in deep neural networks

Deep feedforward networks initialized along the edge of chaos exhibit ex...

Please sign up or login with your details

Forgot password? Click here to reset