The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

11/13/2021
by   Nikhil Ghosh, et al.
22

To understand how deep learning works, it is crucial to understand the training dynamics of neural networks. Several interesting hypotheses about these dynamics have been made based on empirically observed phenomena, but there exists a limited theoretical understanding of when and why such phenomena occur. In this paper, we consider the training dynamics of gradient flow on kernel least-squares objectives, which is a limiting dynamics of SGD trained neural networks. Using precise high-dimensional asymptotics, we characterize the dynamics of the fitted model in two "worlds": in the Oracle World the model is trained on the population distribution and in the Empirical World the model is trained on a sampled dataset. We show that under mild conditions on the kernel and L^2 target regression function the training dynamics undergo three stages characterized by the behaviors of the models in the two worlds. Our theoretical results also mathematically formalize some interesting deep learning phenomena. Specifically, in our setting we show that SGD progressively learns more complex functions and that there is a "deep bootstrap" phenomenon: during the second stage, the test error of both worlds remain close despite the empirical training error being much smaller. Finally, we give a concrete example comparing the dynamics of two different kernels which shows that faster training is not necessary for better generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2018

To understand deep learning we need to understand kernel learning

Generalization performance of classifiers in deep learning has recently ...
research
06/12/2021

What can linearized neural networks actually say about generalization?

For certain infinitely-wide neural networks, the neural tangent kernel (...
research
02/23/2021

Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

A recent series of theoretical works showed that the dynamics of neural ...
research
07/18/2022

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

There is mounting empirical evidence of emergent phenomena in the capabi...
research
03/14/2019

Dynamics Concentration of Large-Scale Tightly-Connected Networks

The ability to achieve coordinated behavior -- engineered or emergent --...
research
02/28/2023

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...
research
07/21/2022

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Our theoretical understanding of deep learning has not kept pace with it...

Please sign up or login with your details

Forgot password? Click here to reset