Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity

06/13/2023
by   Fangshuo Liao, et al.
0

Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under similar settings and assumptions. In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum achieves acceleration in theory for this objective class. We provide two realizations of the problem class, one of which is deep ReLU networks, which –to the best of our knowledge–constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

We consider problems of minimizing functionals ℱ of probability measures...
research
04/18/2022

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Momentum methods, including heavy-ball (HB) and Nesterov's accelerated g...
research
12/10/2020

A Study of Condition Numbers for First-Order Optimization

The study of first-order optimization algorithms (FOA) typically starts ...
research
06/20/2021

Memory Augmented Optimizers for Deep Learning

Popular approaches for minimizing loss in data-driven learning often inv...
research
04/28/2021

FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizer by Strong Convexity

The AdaBelief algorithm demonstrates superior generalization ability to ...
research
09/29/2022

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

We consider the problem of optimization of deep learning models with smo...
research
08/08/2022

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

In this paper, we present the convergence analysis of momentum methods i...

Please sign up or login with your details

Forgot password? Click here to reset