Convergence guarantees for RMSProp and ADAM in non-convex optimization and their comparison to Nesterov acceleration on autoencoders

07/18/2018
by   Amitabh Basu, et al.
0

RMSProp and ADAM continue to be extremely popular algorithms for training neural nets but their theoretical foundations have remained unclear. In this work we make progress towards that by giving proofs that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives and we give bounds on the running time. We then design experiments to compare the performances of RMSProp and ADAM against Nesterov Accelerated Gradient method on a variety of autoencoder setups. Through these experiments we demonstrate the interesting sensitivity that ADAM has to its momentum parameter β_1. We show that in terms of getting lower training and test losses, at very high values of the momentum parameter (β_1 = 0.99) (and large enough nets if using mini-batches) ADAM outperforms NAG at any momentum value tried for the latter. On the other hand, NAG can sometimes do better when ADAM's β_1 is set to the most commonly used value: β_1 = 0.9. We also report experiments on different autoencoders to demonstrate that NAG has better abilities in terms of reducing the gradient norms and finding weights which increase the minimum eigenvalue of the Hessian of the loss function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Last-iterate convergence analysis of stochastic momentum methods for neural networks

The stochastic momentum method is a commonly used acceleration technique...
research
10/24/2021

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Distributionally robust optimization (DRO) is a widely-used approach to ...
research
02/03/2020

Complexity Guarantees for Polyak Steps with Momentum

In smooth strongly convex optimization, or in the presence of Hölderian ...
research
10/12/2022

Momentum Aggregation for Private Non-convex ERM

We introduce new algorithms and convergence guarantees for privacy-prese...
research
08/11/2020

Riemannian stochastic recursive momentum method for non-convex optimization

We propose a stochastic recursive momentum method for Riemannian non-con...
research
05/30/2019

Exploiting Uncertainty of Loss Landscape for Stochastic Optimization

We introduce novel variants of momentum by incorporating the variance of...
research
02/12/2020

Average-case Acceleration Through Spectral Density Estimation

We develop a framework for designing optimal quadratic optimization meth...

Please sign up or login with your details

Forgot password? Click here to reset