Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

05/30/2019
by   Yuan Cao, et al.
4

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Õ(n^-1/2) that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient based methods can learn deep neural...
research
11/21/2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Lin...
research
09/30/2019

On the convergence of gradient descent for two layer neural networks

It has been shown that gradient descent can yield the zero training loss...
research
12/08/2020

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics ...
research
06/07/2022

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization error bounds for deep neural networks trained by stochast...
research
01/07/2019

Generalization in Deep Networks: The Role of Distance from Initialization

Why does training deep neural networks using stochastic gradient descent...
research
04/25/2023

Learning Trajectories are Generalization Indicators

The aim of this paper is to investigate the connection between learning ...

Please sign up or login with your details

Forgot password? Click here to reset