DeepAI
Log In Sign Up

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

05/08/2021
by   Greg Yang, et al.
2

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK. This completes the proof of the *architectural universality* of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem. To facilitate this proof, we develop a graphical notation for Tensor Programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/30/2020

Feature Learning in Infinite-Width Neural Networks

As its width tends to infinity, a deep neural network's behavior under g...
12/05/2019

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Neural Tangents is a library designed to enable research into infinite-w...
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
06/20/2018

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

At initialization, artificial neural networks (ANNs) are equivalent to G...
06/30/2020

Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

Recent work showed that overparameterized autoencoders can be trained to...
06/25/2020

Tensor Programs II: Neural Tangent Kernel for Any Architecture

We prove that a randomly initialized neural network of *any architecture...
11/28/2021

Neural Tangent Kernel of Matrix Product States: Convergence and Applications

In this work, we study the Neural Tangent Kernel (NTK) of Matrix Product...