Variance-Reduced Methods for Machine Learning

10/02/2020
by   Robert M. Gower, et al.
20

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.

READ FULL TEXT
04/17/2017

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic va...
12/05/2015

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic grad...
02/26/2018

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

In this paper, we propose a simple variant of the original SVRG, called ...
02/18/2021

SVRG Meets AdaGrad: Painless Variance Reduction

Variance reduction (VR) methods for finite-sum minimization typically re...
07/11/2022

Randomized Kaczmarz Method for Single Particle X-ray Image Phase Retrieval

In this paper, we investigate phase retrieval algorithm for the single p...
09/29/2022

META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

We study the application of variance reduction (VR) techniques to genera...
01/25/2019

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

In this paper, we propose a unified view of gradient-based algorithms fo...