Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

03/02/2016
by   Ohad Shamir, et al.
0

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be as large as the data size per machine (up to logarithmic factors). Our proof techniques combine ideas from stochastic optimization, adversarial online learning, and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS

Two-level stochastic optimization formulations have become instrumental ...
research
02/24/2016

Online Dual Coordinate Ascent Learning

The stochastic dual coordinate-ascent (S-DCA) technique is a useful alte...
research
10/27/2017

SGDLibrary: A MATLAB library for stochastic gradient descent algorithms

We consider the problem of finding the minimizer of a function f: R^d →R...
research
02/19/2012

Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

Randomized algorithms that base iteration-level decisions on samples fro...
research
07/15/2020

Incremental Without Replacement Sampling in Nonconvex Optimization

Minibatch decomposition methods for empirical risk minimization are comm...
research
02/09/2016

Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization ar...

Please sign up or login with your details

Forgot password? Click here to reset