Feature Learning in Infinite-Width Neural Networks

11/30/2020
by   Greg Yang, et al.
6

As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases. More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2023

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

We analyze the dynamics of finite width effects in wide but finite featu...
research
05/08/2021

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at ini...
research
08/03/2023

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Going beyond stochastic gradient descent (SGD), what new phenomena emerg...
research
07/21/2023

Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

Feature learning, or the ability of deep neural networks to automaticall...
research
05/29/2021

Rapid Feature Evolution Accelerates Learning in Neural Networks

Neural network (NN) training and generalization in the infinite-width li...
research
09/07/2023

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

This work investigates the nuanced algorithm design choices for deep lea...
research
01/26/2023

Neural networks learn to magnify areas near decision boundaries

We study how training molds the Riemannian geometry induced by neural ne...

Please sign up or login with your details

Forgot password? Click here to reset