Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

04/10/2019
by   Weinan E, et al.
15

The behavior of the gradient descent (GD) algorithm is analyzed for a deep neural network model with skip-connections. It is proved that in the over-parametrized regime, for a suitable initialization, with high probability GD can find a global minimum exponentially fast. Generalization error estimates along the GD path are also established. As a consequence, it is shown that when the target function is in the reproducing kernel Hilbert space (RKHS) with a kernel defined by the initialization, there exist generalizable early-stopping solutions along the GD path. In addition, it is also shown that the GD path is uniformly close to the functions given by the related random feature model. Consequently, in this "implicit regularization" setting, the deep neural network model deteriorates to a random feature model. Our results hold for neural networks of any width larger than the input dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2022

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

Estimation of a multivariate regression function from independent and id...
research
08/29/2023

Random feature approximation for general spectral methods

Random feature approximation is arguably one of the most popular techniq...
research
10/04/2022

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

Estimation of a regression function from independent and identically dis...
research
02/01/2023

Gradient Descent in Neural Networks as Sequential Learning in RKBS

The study of Neural Tangent Kernels (NTKs) has provided much needed insi...
research
05/04/2023

Statistical Optimality of Deep Wide Neural Networks

In this paper, we consider the generalization ability of deep wide feedf...
research
02/25/2022

An initial alignment between neural network and target is needed for gradient descent to learn

This paper introduces the notion of "Initial Alignment" (INAL) between a...

Please sign up or login with your details

Forgot password? Click here to reset