Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

10/25/2019

∙

This paper provides a framework to analyze stochastic gradient algorithms in a mean squared error (MSE) sense using the asymptotic normality result of the stochastic gradient descent (SGD) iterates. We perform this analysis by taking the asymptotic normality result and applying it to the finite iteration case. Specifically, we look at problems where the gradient estimators are biased and have reduced variance and compare the iterates generated by these gradient estimators to the iterates generated by the SGD algorithm. We use the work of Fabian to characterize the mean and the variance of the distribution of the iterates in terms of the bias and the covariance matrix of the gradient estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its proof of convergence, which incurs a lower MSE than the SGD algorithm on quadratic and convex problems. Lastly, we present some numerical results to show the effectiveness of this framework and the superiority of SW-SGD algorithm over the SGD algorithm.

READ FULL TEXT

Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

A Rule for Gradient Estimator Selection, with an Application to Variational Inference

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

Dither computing: a hybrid deterministic-stochastic computing framework

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

Stochastic Doubly Robust Gradient

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

Related Research

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

A Rule for Gradient Estimator Selection, with an Application to Variational Inference

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

Dither computing: a hybrid deterministic-stochastic computing framework

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

Stochastic Doubly Robust Gradient

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD