
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Wide neural networks with random weights and biases are Gaussian process...
read it

Tensor Programs III: Neural Matrix Laws
In a neural network (NN), weight matrices linearly transform inputs into...
read it

Feature Learning in InfiniteWidth Neural Networks
As its width tends to infinity, a deep neural network's behavior under g...
read it

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Several recent trends in machine learning theory and practice, from the ...
read it

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at ini...
read it

On the infinite width limit of neural networks with a standard parameterization
There are currently two parameterizations used to derive fixed kernels c...
read it

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
We propose a simple yet effective technique for neural network learning....
read it
Tensor Programs II: Neural Tangent Kernel for Any Architecture
We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We demonstrate how to calculate this limit. In prior literature, the heuristic study of neural network gradients often assumes every weight matrix used in forward propagation is independent from its transpose used in backpropagation (Schoenholz et al. 2017). This is known as the *gradient independence assumption (GIA)*. We identify a commonly satisfied condition, which we call *Simple GIA Check*, such that the NTK limit calculation based on GIA is correct. Conversely, when Simple GIA Check fails, we show GIA can result in wrong answers. Our material here presents the NTK results of Yang (2019a) in a friendly manner and showcases the *tensor programs* technique for understanding wide neural networks. We provide reference implementations of infinitewidth NTKs of recurrent neural network, transformer, and batch normalization at https://github.com/thegregyang/NTK4A.
READ FULL TEXT
Comments
There are no comments yet.