An Improved Analysis of Training Over-parameterized Deep Neural Networks

06/11/2019
by   Difan Zou, et al.
0

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n^24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2019

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over...
research
03/02/2020

On the Global Convergence of Training Deep Linear ResNets

We study the convergence of gradient descent (GD) and stochastic gradien...
research
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
research
05/26/2023

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

Recently, significant progress has been made in understanding the genera...
research
06/05/2022

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

The convergence of GD and SGD when training mildly parameterized neural ...
research
08/30/2019

Partitioned integrators for thermodynamic parameterization of neural networks

Stochastic Gradient Langevin Dynamics, the "unadjusted Langevin algorith...
research
02/19/2019

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network

Adaptive gradient methods like AdaGrad are widely used in optimizing neu...

Please sign up or login with your details

Forgot password? Click here to reset