Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime

09/08/2023
by   Yehonatan Avidan, et al.
0

Artificial neural networks have revolutionized machine learning in recent years, but a complete theoretical framework for their learning process is still lacking. Substantial progress has been made for infinitely wide networks. In this regime, two disparate theoretical frameworks have been used, in which the network's output is described using kernels: one framework is based on the Neural Tangent Kernel (NTK) which assumes linearized gradient descent dynamics, while the Neural Network Gaussian Process (NNGP) kernel assumes a Bayesian framework. However, the relation between these two frameworks has remained elusive. This work unifies these two distinct theories using a Markov proximal learning model for learning dynamics in an ensemble of randomly initialized infinitely wide deep networks. We derive an exact analytical expression for the network input-output function during and after learning, and introduce a new time-dependent Neural Dynamical Kernel (NDK) from which both NTK and NNGP kernels can be derived. We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning. In the initial gradient-driven learning phase, the dynamics is dominated by deterministic gradient descent, and is described by the NTK theory. This phase is followed by the diffusive learning stage, during which the network parameters sample the solution space, ultimately approaching the equilibrium distribution corresponding to NNGP. Combined with numerical evaluations on synthetic and benchmark datasets, we provide novel insights into the different roles of initialization, regularization, and network depth, as well as phenomena such as early stopping and representational drift. This work closes the gap between the NTK and NNGP theories, providing a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.

READ FULL TEXT
research
05/31/2019

Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel

Stochastic Gradient Descent (SGD) is widely used to train deep neural ne...
research
02/18/2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

A longstanding goal in deep learning research has been to precisely char...
research
06/20/2018

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

At initialization, artificial neural networks (ANNs) are equivalent to G...
research
04/13/2020

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...
research
02/23/2023

Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

We systematically analyze optimization dynamics in deep neural networks ...
research
05/30/2018

The Dynamics of Learning: A Random Matrix Approach

Understanding the learning dynamics of neural networks is one of the key...
research
03/27/2020

On the Optimization Dynamics of Wide Hypernetworks

Recent results in the theoretical study of deep learning have shown that...

Please sign up or login with your details

Forgot password? Click here to reset