Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

07/31/2018
by   Jie Chen, et al.
0

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions as well as training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graph learning) where an unbiased estimator may be as expensive to compute as the full gradient, because training examples are interconnected. In a recent work, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones. Our analysis covers strongly convex, convex, and nonconvex objectives. This work opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2019

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing no...
research
02/18/2018

Optimizing Spectral Sums using Randomized Chebyshev Expansions

The trace of matrix functions, often called spectral sums, e.g., rank, l...
research
12/11/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Stochastic gradient descent (SGD) is a cornerstone of machine learning. ...
research
05/31/2020

Graph Learning with Loss-Guided Training

Classically, ML models trained with stochastic gradient descent (SGD) ar...
research
05/25/2023

A Guide Through the Zoo of Biased SGD

Stochastic Gradient Descent (SGD) is arguably the most important single ...
research
07/11/2020

Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation

In this paper, we aim to solve Bayesian Risk Optimization (BRO), which i...
research
09/24/2021

Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts

Training large-scale mixture of experts models efficiently on modern har...

Please sign up or login with your details

Forgot password? Click here to reset