A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

04/08/2019
by   Weinan E, et al.
25

A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space. Our analysis suggests strongly that in terms of `implicit regularization', two-layer neural network models do not outperform the kernel method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

The behavior of the gradient descent (GD) algorithm is analyzed for a de...
research
06/25/2020

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

A numerical and phenomenological study of the gradient descent (GD) algo...
research
02/01/2023

Gradient Descent in Neural Networks as Sequential Learning in RKBS

The study of Neural Tangent Kernels (NTKs) has provided much needed insi...
research
01/01/2023

Sharper analysis of sparsely activated wide neural networks with trainable biases

This work studies training one-hidden-layer overparameterized ReLU netwo...
research
06/13/2019

Kernel and Deep Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
07/25/2023

Modify Training Directions in Function Space to Reduce Generalization Error

We propose theoretical analyses of a modified natural gradient descent m...
research
11/04/2019

Persistency of Excitation for Robustness of Neural Networks

When an online learning algorithm is used to estimate the unknown parame...

Please sign up or login with your details

Forgot password? Click here to reset