Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

10/25/2019
by   Yakup Ceki Papo, et al.
0

This paper provides a framework to analyze stochastic gradient algorithms in a mean squared error (MSE) sense using the asymptotic normality result of the stochastic gradient descent (SGD) iterates. We perform this analysis by taking the asymptotic normality result and applying it to the finite iteration case. Specifically, we look at problems where the gradient estimators are biased and have reduced variance and compare the iterates generated by these gradient estimators to the iterates generated by the SGD algorithm. We use the work of Fabian to characterize the mean and the variance of the distribution of the iterates in terms of the bias and the covariance matrix of the gradient estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its proof of convergence, which incurs a lower MSE than the SGD algorithm on quadratic and convex problems. Lastly, we present some numerical results to show the effectiveness of this framework and the superiority of SW-SGD algorithm over the SGD algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
11/05/2019

A Rule for Gradient Estimator Selection, with an Application to Variational Inference

Stochastic gradient descent (SGD) is the workhorse of modern machine lea...
research
06/01/2022

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

When solving finite-sum minimization problems, two common alternatives t...
research
02/22/2021

Dither computing: a hybrid deterministic-stochastic computing framework

Stochastic computing has a long history as an alternative method of perf...
research
04/06/2022

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

We introduce a general framework for nonlinear stochastic gradient desce...
research
12/21/2018

Stochastic Doubly Robust Gradient

When training a machine learning model with observational data, it is of...
research
12/11/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Stochastic gradient descent (SGD) is a cornerstone of machine learning. ...

Please sign up or login with your details

Forgot password? Click here to reset