Factorial Powers for Stochastic Optimization

06/01/2020
by   Aaron Defazio, et al.
17

The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants. In this work we propose the use of factorial powers as a flexible tool for defining constants that appear in convergence proofs. We list a number of remarkable properties that these sequences enjoy, and show how they can be applied to convergence proofs to simplify or improve the convergence rates of the momentum method, accelerated gradient and the stochastic variance reduced method (SVRG).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2018

On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, Ada...
research
01/23/2021

Acceleration Methods

This monograph covers some recent advances on a range of acceleration te...
research
02/29/2020

Dimension-free convergence rates for gradient Langevin dynamics in RKHS

Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracte...
research
08/30/2018

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic momentum methods have been widely adopted in training deep ne...
research
04/09/2023

μ^2-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective ...
research
07/02/2019

The Role of Memory in Stochastic Optimization

The choice of how to retain information about past gradients dramaticall...
research
06/25/2017

A Unified Analysis of Stochastic Optimization Methods Using Jump System Theory and Quadratic Constraints

We develop a simple routine unifying the analysis of several important r...

Please sign up or login with your details

Forgot password? Click here to reset