Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

12/08/2012
by   Ohad Shamir, et al.
0

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/√(T)) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal rates obtainable by appropriate averaging schemes. We also propose a new and simple averaging scheme, which not only attains optimal rates, but can also be easily computed on-the-fly (in contrast, the suffix averaging scheme proposed in Rakhlin et al. (2011) is not as simple to implement). Finally, we provide some experimental illustrations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
03/09/2020

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Stochastic gradient descent (SGD) has been widely studied in the literat...
research
09/19/2022

Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers

In this paper, we propose a new covering technique localized for the tra...
research
08/15/2020

Obtaining Adjustable Regularization for Free via Iterate Averaging

Regularization for optimization is a crucial technique to avoid overfitt...
research
06/04/2019

Embedded hyper-parameter tuning by Simulated Annealing

We propose a new metaheuristic training scheme that combines Stochastic ...
research
03/29/2023

Unified analysis of SGD-type methods

This note focuses on a simple approach to the unified analysis of SGD-ty...
research
06/06/2023

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

We propose a Randomized Progressive Training algorithm (RPT) – a stochas...

Please sign up or login with your details

Forgot password? Click here to reset