Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

06/10/2013
by   Francis Bach, et al.
0

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^1/2). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments on standard machine learning benchmarks showing that they often outperform existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2019

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

We consider stochastic gradient descent algorithms for minimizing a non-...
research
02/21/2017

Stochastic Composite Least-Squares Regression with convergence rate O(1/n)

We consider the minimization of composite objective functions composed o...
research
06/01/2020

Least-squares regressions via randomized Hessians

We consider the least-squares regression problem with a finite number of...
research
02/07/2020

On the Effectiveness of Richardson Extrapolation in Machine Learning

Richardson extrapolation is a classical technique from numerical analysi...
research
10/03/2020

Practical Precoding via Asynchronous Stochastic Successive Convex Approximation

We consider stochastic optimization of a smooth non-convex loss function...
research
12/14/2020

Noisy Linear Convergence of Stochastic Gradient Descent for CV@R Statistical Learning under Polyak-Łojasiewicz Conditions

Conditional Value-at-Risk (CV@R) is one of the most popular measures of ...
research
08/16/2016

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

In 1963, Polyak proposed a simple condition that is sufficient to show a...

Please sign up or login with your details

Forgot password? Click here to reset