On the convergence of gradient descent for two layer neural networks

09/30/2019
by   Lei Li, et al.
0

It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing works, we investigate the gradient descent method for training two-layer neural networks for approximating some target continuous functions. By making use the generic chaining technique from probability theory, we show that gradient descent can yield an exponential convergence rate, while the width of the neural networks needed is independent of the size of the training data. The result also implies some strong approximation ability of the two-layer neural networks without curse of dimensionality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

We study the training and generalization of deep neural networks (DNNs) ...
research
09/17/2022

Approximation results for Gradient Descent trained Shallow Neural Networks in 1d

Two aspects of neural networks that have been extensively studied in the...
research
10/03/2022

Limitations of neural network training due to numerical instability of backpropagation

We study the training of deep neural networks by gradient descent where ...
research
08/23/2023

Layer-wise Feedback Propagation

In this paper, we present Layer-wise Feedback Propagation (LFP), a novel...
research
07/04/2022

Automating the Design and Development of Gradient Descent Trained Expert System Networks

Prior work introduced a gradient descent trained expert system that conc...
research
12/04/2020

When does gradient descent with logistic loss find interpolating two-layer networks?

We study the training of finite-width two-layer smoothed ReLU networks f...

Please sign up or login with your details

Forgot password? Click here to reset