The Statistical Complexity of Early Stopped Mirror Descent

02/01/2020
by   Tomas Vaškevičius, et al.
0

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with squared loss for linear models and kernel methods. We identify a link between offset Rademacher complexities and potential-based analysis of mirror descent that allows disentangling statistics from optimization in the analysis of such algorithms. Our main result characterizes the statistical performance of the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and number of iterations. We apply our theory to recover, in a rather clean and elegant manner, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

We study the statistical and computational complexities of the Polyak st...
research
08/12/2021

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

In this paper, we study the implicit bias of gradient descent for sparse...
research
05/16/2022

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

Using gradient descent (GD) with fixed or decaying step-size is standard...
research
02/09/2022

Improving Computational Complexity in Statistical Models with Second-Order Information

It is known that when the statistical models are singular, i.e., the Fis...
research
07/05/2017

Early stopping for kernel boosting algorithms: A general analysis with localized complexities

Early stopping of iterative algorithms is a widely-used form of regulari...
research
08/13/2019

Distributionally Robust Optimization: A Review

The concepts of risk-aversion, chance-constrained optimization, and robu...
research
10/06/2022

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Gradient regularization (GR) is a method that penalizes the gradient nor...

Please sign up or login with your details

Forgot password? Click here to reset