Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

07/13/2023
by   Ziyang Wei, et al.
0

Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality

Stochastic gradient descent with momentum (SGDM) has been widely used in...
research
10/25/2019

Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

This paper provides a framework to analyze stochastic gradient algorithm...
research
06/04/2020

Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent

In this paper, we investigate a general class of stochastic gradient des...
research
12/08/2012

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
06/06/2021

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

We develop a new method of online inference for a vector of parameters e...
research
04/06/2022

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

We introduce a general framework for nonlinear stochastic gradient desce...
research
08/28/2020

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

The theory and practice of stochastic optimization has focused on stocha...

Please sign up or login with your details

Forgot password? Click here to reset