Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

02/17/2016
by   Aymeric Dieuleveut, et al.
0

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the " optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2022

Accelerated SGD for Non-Strongly-Convex Least Squares

We consider stochastic approximation for the least squares regression pr...
research
09/29/2010

Optimal learning rates for Kernel Conjugate Gradient regression

We prove rates of convergence in the statistical sense for kernel-based ...
research
07/04/2023

Accelerated stochastic approximation with state-dependent noise

We consider a class of stochastic smooth convex optimization problems un...
research
02/03/2022

Multiclass learning with margin: exponential rates with no bias-variance trade-off

We study the behavior of error bounds for multiclass classification unde...
research
05/08/2019

Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

We analyse the learning performance of Distributed Gradient Descent in t...
research
06/17/2022

Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization

We consider the smooth convex-concave bilinearly-coupled saddle-point pr...
research
05/27/2019

Robustness of accelerated first-order algorithms for strongly convex optimization problems

We study the robustness of accelerated first-order algorithms to stochas...

Please sign up or login with your details

Forgot password? Click here to reset