Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

07/05/2021
by   Xin Liu, et al.
0

Despite the empirical success of deep learning, it still lacks theoretical understandings to explain why randomly initialized neural network trained by first-order optimization methods is able to achieve zero training loss, even though its landscape is non-convex and non-smooth. Recently, there are some works to demystifies this phenomenon under over-parameterized regime. In this work, we make further progress on this area by considering a commonly used momentum optimization algorithm: Nesterov accelerated method (NAG). We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to zero at a linear convergence rate 1-Θ(1/√(κ)), where κ > 1 is determined by the initialization and the architecture of neural network. Comparing to the rate 1-Θ(1/κ) of gradient descent, NAG achieves an acceleration. Besides, it also validates NAG and Heavy-ball method can achieve a similar convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Momentum methods, including heavy-ball (HB) and Nesterov's accelerated g...
research
10/04/2018

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mystery in the success of neural networks is randomly initial...
research
10/25/2020

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

When equipped with efficient optimization algorithms, the over-parameter...
research
08/08/2022

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

In this paper, we present the convergence analysis of momentum methods i...
research
11/27/2019

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

A remarkable recent discovery in machine learning has been that deep neu...
research
09/12/2023

Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding

Solving linear inverse problems plays a crucial role in numerous applica...
research
06/23/2021

Understanding Modern Techniques in Optimization: Frank-Wolfe, Nesterov's Momentum, and Polyak's Momentum

In the first part of this dissertation research, we develop a modular fr...

Please sign up or login with your details

Forgot password? Click here to reset