A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

04/18/2022
by   Xin Liu, et al.
0

Momentum methods, including heavy-ball (HB) and Nesterov's accelerated gradient (NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momentum methods in an over-parameterized regime, where the number of the parameters exceeds that of the training instances. Nonetheless, current results mainly focus on the two-layer neural network, which are far from explaining the remarkable success of the momentum methods in training deep neural networks. Motivated by this, we investigate the convergence of NAG with constant learning rate and momentum parameter in training two architectures of deep linear networks: deep fully-connected linear neural networks and deep linear ResNets. Based on the over-parameterization regime, we first analyze the residual dynamics induced by the training trajectory of NAG for a deep fully-connected linear neural network under the random Gaussian initialization. Our results show that NAG can converge to the global minimum at a (1 - 𝒪(1/√(κ)))^t rate, where t is the iteration number and κ > 1 is a constant depending on the condition number of the feature matrix. Compared to the (1 - 𝒪(1/κ))^t rate of GD, NAG achieves an acceleration over GD. To the best of our knowledge, this is the first theoretical guarantee for the convergence of NAG to the global minimum in training deep neural networks. Furthermore, we extend our analysis to deep linear ResNets and derive a similar convergence result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Despite the empirical success of deep learning, it still lacks theoretic...
research
03/01/2023

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Sharpness aware minimization (SAM) optimizer has been extensively explor...
research
06/13/2023

Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity

Current state-of-the-art analyses on the convergence of gradient descent...
research
04/12/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Existing global convergence guarantees of (stochastic) gradient descent ...
research
07/04/2020

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Over-parameterization is ubiquitous nowadays in training neural networks...
research
08/08/2022

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

In this paper, we present the convergence analysis of momentum methods i...
research
01/08/2022

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergenc...

Please sign up or login with your details

Forgot password? Click here to reset