Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network

02/19/2019
by   Xiaoxia Wu, et al.
6

Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks. Yet, existing convergence guarantees for adaptive gradient methods require either convexity or smoothness, and, in the smooth setting, only guarantee convergence to a stationary point. We propose an adaptive gradient method and show that for two-layer over-parameterized neural networks -- if the width is sufficiently large (polynomially) -- then the proposed method converges to the global minimum in polynomial time, and convergence is robust, without the need to fine-tune hyper-parameters such as the step-size schedule and with the level of over-parametrization independent of the training error. Our analysis indicates in particular that over-parametrization is crucial for the harnessing the full potential of adaptive gradient methods in the setting of neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
research
06/11/2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...
research
06/11/2019

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with ...
research
06/05/2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Adaptive gradient methods such as AdaGrad and its variants update the st...
research
04/30/2019

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks,...
research
11/28/2020

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

Over-parameterization and adaptive methods have played a crucial role in...
research
02/26/2020

Disentangling Adaptive Gradient Methods from Learning Rates

We investigate several confounding factors in the evaluation of optimiza...

Please sign up or login with your details

Forgot password? Click here to reset