Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

10/28/2019
by   Greg Yang, et al.
16

Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the *tensor programs* technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at github.com/thegregyang/GP4A.

READ FULL TEXT

page 8

page 9

page 26

page 28

page 35

research
07/25/2021

A brief note on understanding neural networks as Gaussian processes

As a generalization of the work in [Lee et al., 2017], this note briefly...
research
06/25/2020

Tensor Programs II: Neural Tangent Kernel for Any Architecture

We prove that a randomly initialized neural network of *any architecture...
research
11/26/2022

Why Neural Networks Work

We argue that many properties of fully-connected feedforward neural netw...
research
06/11/2019

Neural network identifiability for a family of sigmoidal nonlinearities

This paper addresses the following question of neural network identifiab...
research
10/25/2018

A Gaussian Process perspective on Convolutional Neural Networks

In this paper we cast the well-known convolutional neural network in a G...
research
06/18/2020

Infinite attention: NNGP and NTK for deep attention networks

There is a growing amount of literature on the relationship between wide...

Please sign up or login with your details

Forgot password? Click here to reset