Variance-Reduced Methods for Machine Learning

by   Robert M. Gower, et al.

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.


Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic va...

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic grad...

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

In this paper, we propose a simple variant of the original SVRG, called ...

SVRG Meets AdaGrad: Painless Variance Reduction

Variance reduction (VR) methods for finite-sum minimization typically re...

Randomized Kaczmarz Method for Single Particle X-ray Image Phase Retrieval

In this paper, we investigate phase retrieval algorithm for the single p...

META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

We study the application of variance reduction (VR) techniques to genera...

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

In this paper, we propose a unified view of gradient-based algorithms fo...