Towards stability and optimality in stochastic gradient descent

05/10/2015
by   Panos Toulis, et al.
0

Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency, AI-SGD employs averaging of the iterates, which achieves the optimal Cramér-Rao bound under strong convexity, i.e., it is an optimal unbiased estimator of the true parameter value. For numerical stability, AI-SGD employs an implicit update at each iteration, which is related to proximal operators in optimization. In practice, AI-SGD achieves competitive performance with other state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2015

Implicit stochastic approximation

The need to carry out parameter estimation from massive data has reinvig...
research
09/22/2015

Stochastic gradient descent methods for estimation with large data sets

We develop methods for parameter estimation in settings with large-scale...
research
06/25/2022

Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

The implicit stochastic gradient descent (ISGD), a proximal version of S...
research
09/10/2013

Exponentially Fast Parameter Estimation in Networks Using Distributed Dual Averaging

In this paper we present an optimization-based view of distributed param...
research
04/19/2013

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

We consider stochastic strongly convex optimization with a complex inequ...
research
02/17/2023

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

In this paper, we investigate the impact of stochasticity and large step...
research
02/13/2018

Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent

Stochastic gradient descent (SGD) is an immensely popular approach for o...

Please sign up or login with your details

Forgot password? Click here to reset