Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel

05/31/2019
by   Soufiane Hayou, et al.
0

Stochastic Gradient Descent (SGD) is widely used to train deep neural networks. However, few theoretical results on the training dynamics of SGD are available. Recent work by Jacot et al. (2018) has showed that training a neural network of any kind with a full batch gradient descent in parameter space is equivalent to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result to show that the output of a neural network trained using full batch gradient descent can be approximated by a linear model for wide neural networks. We show here how these results can be extended to SGD. In this case, the resulting training dynamics is given by a stochastic differential equation dependent on the NTK which becomes a simple mean-reverting process for the squared loss. When the network depth is also large, we provide a comprehensive analysis on the impact of the initialization and the activation function on the NTK, and thus on the corresponding training dynamics under SGD. We provide experiments illustrating our theoretical results.

READ FULL TEXT
research
04/15/2019

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD)...
research
05/13/2021

The Dynamics of Gradient Descent for Overparametrized Neural Networks

We consider the dynamics of gradient descent (GD) in overparameterized s...
research
06/07/2023

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

In this paper, we first present an explanation regarding the common occu...
research
12/23/2015

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Effective training of deep neural networks suffers from two main issues....
research
05/02/2021

Universal scaling laws in the gradient descent training of neural networks

Current theoretical results on optimization trajectories of neural netwo...
research
04/22/2021

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Linear interpolation between initial neural network parameters and conve...
research
09/08/2023

Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime

Artificial neural networks have revolutionized machine learning in recen...

Please sign up or login with your details

Forgot password? Click here to reset