Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

06/15/2020
by   Raphaël Berthier, et al.
0

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y = ⟨θ_*, X ⟩ between the random output Y and the random feature vector Φ(U), a potentially non-linear transformation of the inputs U. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum θ_* and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum θ_* and of the feature vectors Φ(u). We interpret our result in the reproducing kernel Hilbert space framework; as a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points. The convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...
research
08/15/2022

Convergence Rates for Stochastic Approximation on a Boundary

We analyze the behavior of projected stochastic gradient descent focusin...
research
01/22/2018

Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms

We study generalization properties of distributed algorithms in the sett...
research
07/01/2020

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Constant step-size Stochastic Gradient Descent exhibits two phases: a tr...
research
11/24/2022

Online Regularized Learning Algorithm for Functional Data

In recent years, functional linear models have attracted growing attenti...
research
05/08/2018

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel op...
research
04/16/2019

On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

In recent years, the filtering-clustering problems have been a central t...

Please sign up or login with your details

Forgot password? Click here to reset