SVRG Meets AdaGrad: Painless Variance Reduction

02/18/2021
by   Benjamin Dubois-Taine, et al.
4

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a fully adaptive variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step-size, and allowing it to adaptively determine the length of each inner-loop. When minimizing a sum of n smooth convex functions, we prove that AdaSVRG requires O(n + 1/ϵ) gradient evaluations to achieve an ϵ-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. However, VR methods including AdaSVRG are slower than SGD when used with over-parameterized models capable of interpolating the training data. Hence, we also propose a hybrid algorithm that can adaptively switch from AdaGrad to AdaSVRG, achieving the best of both stochastic gradient and VR methods, but without needing to tune their step-sizes. Via experiments on synthetic and standard real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over other "tune-free" VR methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...
research
08/25/2019

Almost Tune-Free Variance Reduction

The variance reduction class of algorithms including the representative ...
research
10/02/2020

Variance-Reduced Methods for Machine Learning

Stochastic optimization lies at the heart of machine learning, and its c...
research
08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...
research
09/09/2020

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

We study stochastic decentralized optimization for the problem of traini...
research
11/03/2022

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

We propose an adaptive variance-reduction method, called AdaSpider, for ...

Please sign up or login with your details

Forgot password? Click here to reset