SVRG Meets AdaGrad: Painless Variance Reduction

by   Benjamin Dubois-Taine, et al.

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a fully adaptive variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step-size, and allowing it to adaptively determine the length of each inner-loop. When minimizing a sum of n smooth convex functions, we prove that AdaSVRG requires O(n + 1/ϵ) gradient evaluations to achieve an ϵ-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. However, VR methods including AdaSVRG are slower than SGD when used with over-parameterized models capable of interpolating the training data. Hence, we also propose a hybrid algorithm that can adaptively switch from AdaGrad to AdaSVRG, achieving the best of both stochastic gradient and VR methods, but without needing to tune their step-sizes. Via experiments on synthetic and standard real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over other "tune-free" VR methods.


page 1

page 2

page 3

page 4


Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...

Almost Tune-Free Variance Reduction

The variance reduction class of algorithms including the representative ...

Variance-Reduced Methods for Machine Learning

Stochastic optimization lies at the heart of machine learning, and its c...

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

We study stochastic decentralized optimization for the problem of traini...

Randomized Stochastic Gradient Descent Ascent

An increasing number of machine learning problems, such as robust or adv...

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

We present a unified framework to analyze the global convergence of Lang...

Randomized Kaczmarz Method for Single Particle X-ray Image Phase Retrieval

In this paper, we investigate phase retrieval algorithm for the single p...