The Dynamics of Gradient Descent for Overparametrized Neural Networks

05/13/2021
by   Siddhartha Satpathi, et al.
0

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network. To illustrate the application of this result, we show that the GD converges to a prediction function that generalizes well, thereby providing an alternative proof of the generalization results in Arora et al. (2019).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel

Stochastic Gradient Descent (SGD) is widely used to train deep neural ne...
research
05/07/2018

Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks

We analyze Gradient Descent applied to learning a bounded target functio...
research
11/07/2019

How implicit regularization of Neural Networks affects the learned function – Part I

Today, various forms of neural networks are trained to perform approxima...
research
05/17/2023

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Recent research in neural networks and machine learning suggests that us...
research
04/22/2021

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Linear interpolation between initial neural network parameters and conve...
research
04/19/2019

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

We consider deep networks, trained via stochastic gradient descent to mi...
research
11/04/2019

Persistency of Excitation for Robustness of Neural Networks

When an online learning algorithm is used to estimate the unknown parame...

Please sign up or login with your details

Forgot password? Click here to reset