Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

05/08/2021
by   Greg Yang, et al.
2

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK. This completes the proof of the *architectural universality* of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem. To facilitate this proof, we develop a graphical notation for Tensor Programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Feature Learning in Infinite-Width Neural Networks

As its width tends to infinity, a deep neural network's behavior under g...
research
08/03/2023

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Going beyond stochastic gradient descent (SGD), what new phenomena emerg...
research
12/05/2019

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Neural Tangents is a library designed to enable research into infinite-w...
research
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
research
06/25/2020

Tensor Programs II: Neural Tangent Kernel for Any Architecture

We prove that a randomly initialized neural network of *any architecture...
research
06/30/2020

Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

Recent work showed that overparameterized autoencoders can be trained to...
research
11/28/2021

Neural Tangent Kernel of Matrix Product States: Convergence and Applications

In this work, we study the Neural Tangent Kernel (NTK) of Matrix Product...

Please sign up or login with your details

Forgot password? Click here to reset