Tensor Programs III: Neural Matrix Laws

09/22/2020
by   Greg Yang, et al.
0

In a neural network (NN), weight matrices linearly transform inputs into preactivations that are then transformed nonlinearly into activations. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become independent from the weights as the NN's widths tend to infinity, in the sense of asymptotic freeness in random matrix theory. We call this the Free Independence Principle (FIP), which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of gradient independence assumption used for calculating the Neural Tangent Kernel of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results.

READ FULL TEXT
research
06/25/2020

Tensor Programs II: Neural Tangent Kernel for Any Architecture

We prove that a randomly initialized neural network of *any architecture...
research
09/12/2023

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

We prove the converse of the universal approximation theorem, i.e. a neu...
research
09/29/2012

Self-Delimiting Neural Networks

Self-delimiting (SLIM) programs are a central concept of theoretical com...
research
03/13/2023

Provable Convergence of Tensor Decomposition-Based Neural Network Training

Advanced tensor decomposition, such as tensor train (TT), has been widel...
research
12/14/2018

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

We study products of random matrices in the regime where the number of t...
research
10/03/2022

Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks

Since the recognition in the early nineties of the vanishing/exploding (...
research
12/22/2020

Residual Matrix Product State for Machine Learning

Tensor network (TN), which originates from quantum physics, shows broad ...

Please sign up or login with your details

Forgot password? Click here to reset