A Guide Through the Zoo of Biased SGD

05/25/2023
by   Yury Demidovich, et al.
0

Stochastic Gradient Descent (SGD) is arguably the most important single algorithm in modern machine learning. Although SGD with unbiased gradient estimators has been studied extensively over at least half a century, SGD variants relying on biased estimators are rare. Nevertheless, there has been an increased interest in this topic in recent years. However, existing literature on SGD with biased estimators (BiasedSGD) lacks coherence since each new paper relies on a different set of assumptions, without any clear understanding of how they are connected, which may lead to confusion. We address this gap by establishing connections among the existing assumptions, and presenting a comprehensive map of the underlying relationships. Additionally, we introduce a new set of assumptions that is provably weaker than all previous assumptions, and use it to present a thorough analysis of BiasedSGD in both convex and non-convex settings, offering advantages over previous results. We also provide examples where biased estimators outperform their unbiased counterparts or where unbiased versions are simply not available. Finally, we demonstrate the effectiveness of our framework through experimental results that validate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...
research
07/31/2018

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one...
research
02/18/2018

Optimizing Spectral Sums using Randomized Chebyshev Expansions

The trace of matrix functions, often called spectral sums, e.g., rank, l...
research
12/11/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Stochastic gradient descent (SGD) is a cornerstone of machine learning. ...
research
02/27/2020

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have...
research
09/24/2021

Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts

Training large-scale mixture of experts models efficiently on modern har...
research
02/22/2021

Dither computing: a hybrid deterministic-stochastic computing framework

Stochastic computing has a long history as an alternative method of perf...

Please sign up or login with your details

Forgot password? Click here to reset