Variance Reduction on Adaptive Stochastic Mirror Descent

12/26/2020
by   Wenjie Li, et al.
0

We study the idea of variance reduction applied to adaptive stochastic mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems. We propose a simple yet generalized adaptive mirror descent algorithm with variance reduction named SVRAMD and provide its convergence analysis in different settings. We prove that variance reduction reduces the gradient complexity of most adaptive mirror descent algorithms and boost their convergence. In particular, our general theory implies variance reduction can be applied to algorithms using time-varying step sizes and self-adaptive algorithms such as AdaGrad and RMSProp. Moreover, our convergence rates recover the best existing rates of non-adaptive algorithms. We check the validity of our claims using experiments in deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2016

Stochastic Frank-Wolfe Methods for Nonconvex Optimization

We study Frank-Wolfe methods for nonconvex stochastic and finite-sum opt...
research
07/03/2020

Variance reduction for Riemannian non-convex optimization with batch size adaptation

Variance reduction techniques are popular in accelerating gradient desce...
research
08/31/2012

On the convergence of maximum variance unfolding

Maximum Variance Unfolding is one of the main methods for (nonlinear) di...
research
09/19/2016

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

A recent algorithmic family for distributed optimization, DIGing's, have...
research
08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...
research
11/21/2022

Adaptive Stochastic Optimisation of Nonconvex Composite Objectives

In this paper, we propose and analyse a family of generalised stochastic...
research
06/10/2018

Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs

Techniques for reducing the variance of gradient estimates used in stoch...

Please sign up or login with your details

Forgot password? Click here to reset