SVRG Meets AdaGrad: Painless Variance Reduction

02/18/2021
by   Benjamin Dubois-Taine, et al.
4

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a fully adaptive variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step-size, and allowing it to adaptively determine the length of each inner-loop. When minimizing a sum of n smooth convex functions, we prove that AdaSVRG requires O(n + 1/ϵ) gradient evaluations to achieve an ϵ-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. However, VR methods including AdaSVRG are slower than SGD when used with over-parameterized models capable of interpolating the training data. Hence, we also propose a hybrid algorithm that can adaptively switch from AdaGrad to AdaSVRG, achieving the best of both stochastic gradient and VR methods, but without needing to tune their step-sizes. Via experiments on synthetic and standard real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over other "tune-free" VR methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/28/2022

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...
08/25/2019

Almost Tune-Free Variance Reduction

The variance reduction class of algorithms including the representative ...
10/02/2020

Variance-Reduced Methods for Machine Learning

Stochastic optimization lies at the heart of machine learning, and its c...
09/09/2020

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

We study stochastic decentralized optimization for the problem of traini...
11/25/2021

Randomized Stochastic Gradient Descent Ascent

An increasing number of machine learning problems, such as robust or adv...
07/20/2017

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

We present a unified framework to analyze the global convergence of Lang...
07/11/2022

Randomized Kaczmarz Method for Single Particle X-ray Image Phase Retrieval

In this paper, we investigate phase retrieval algorithm for the single p...