Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

10/03/2020
by   Brett Daley, et al.
0

Many popular adaptive gradient methods such as Adam and RMSProp rely on an exponential moving average (EMA) to normalize their stepsizes. While the EMA makes these methods highly responsive to new gradient information, recent research has shown that it also causes divergence on at least one convex optimization problem. We propose a novel method called Expectigrad, which adjusts stepsizes according to a per-component unweighted mean of all historical gradients and computes a bias-corrected momentum term jointly between the numerator and denominator. We prove that Expectigrad cannot diverge on every instance of the optimization problem known to cause Adam to diverge. We also establish a regret bound in the general stochastic nonconvex setting that suggests Expectigrad is less susceptible to gradient variance than existing methods are. Testing Expectigrad on several high-dimensional machine learning tasks, we find it often performs favorably to state-of-the-art methods with little hyperparameter tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2020

Distributed Online Optimization via Gradient Tracking with Adaptive Momentum

This paper deals with a network of computing agents aiming to solve an o...
research
06/15/2021

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Adaptive gradient methods have shown excellent performance for solving m...
research
03/07/2023

On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level

Bilevel optimization is a popular two-level hierarchical optimization, w...
research
04/19/2019

On the Convergence of Adam and Beyond

Several recently proposed stochastic optimization methods that have been...
research
10/25/2018

Truncated Back-propagation for Bilevel Optimization

Bilevel optimization has been recently revisited for designing and analy...
research
10/04/2016

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

In this paper, we show how to transform any optimization problem that ar...
research
11/09/2020

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Standard first-order stochastic optimization algorithms base their updat...

Please sign up or login with your details

Forgot password? Click here to reset