Optimal Rates for Multi-pass Stochastic Gradient Methods

by   Junhong Lin, et al.
Istituto Italiano di Tecnologia

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases. As a byproduct, we derive optimal convergence results for batch gradient methods (even in the non-attainable cases).


page 1

page 2

page 3

page 4


Optimal Rates for Learning with Nyström Stochastic Gradient Methods

In the setting of nonparametric regression, we propose and study a combi...

Learning with SGD and Random Features

Sketching and stochastic gradient methods are arguably the most common t...

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Stochastic gradient descent is a canonical tool for addressing stochasti...

Learning with incremental iterative regularization

Within a statistical learning setting, we propose and study an iterative...

Statistical Inference with Stochastic Gradient Algorithms

Tuning of stochastic gradient algorithms (SGAs) for optimization and sam...

Generalization Properties and Implicit Regularization for Multiple Passes SGM

We study the generalization properties of stochastic gradient methods fo...

Optimal mini-batch and step sizes for SAGA

Recently it has been shown that the step sizes of a family of variance r...

Please sign up or login with your details

Forgot password? Click here to reset