A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

02/04/2019
by   Yuan Cao, et al.
52

Empirical studies show that gradient based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. While a line of recent work explains in theory that gradient-based methods with proper random initialization can find the global minima of the training loss in over-parameterized DNNs, it does not explain the good generalization performance of the gradient-based methods for learning over-parameterized DNNs. In this work, we take a step further, and prove that under certain assumption on the data distribution that is milder than linear separability, gradient descent (GD) with proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small expected error (i.e., population error). This leads to a non-vacuous algorithmic-dependent generalization error bound for deep learning. To the best of our knowledge, this is the first result of its kind that explains the good generalization performance of over-parameterized deep neural networks learned by gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

We study the training and generalization of deep neural networks (DNNs) ...
research
08/13/2018

Understanding training and generalization in deep learning by Fourier analysis

Background: It is still an open research area to theoretically understan...
research
02/29/2020

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning

The success of deep learning is due, to a great extent, to the remarkabl...
research
05/19/2019

A type of generalization error induced by initialization in deep neural networks

How different initializations and loss functions affect the learning of ...
research
05/25/2019

Global Minima of DNNs: The Plenty Pantry

A common strategy to train deep neural networks (DNNs) is to use very la...
research
04/01/2019

On the Power and Limitations of Random Features for Understanding Neural Networks

Recently, a spate of papers have provided positive theoretical results f...
research
11/02/2022

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

Deep Neural Networks (DNNs) outshine alternative function approximators ...

Please sign up or login with your details

Forgot password? Click here to reset