Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

10/15/2021
by   Tongzheng Ren, et al.
0

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

Improving Computational Complexity in Statistical Models with Second-Order Information

It is known that when the statistical models are singular, i.e., the Fis...
research
02/01/2020

The Statistical Complexity of Early Stopped Mirror Descent

Recently there has been a surge of interest in understanding implicit re...
research
05/22/2020

Instability, Computational Efficiency and Statistical Accuracy

Many statistical estimators are defined as the fixed point of a data-dep...
research
05/23/2022

Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

The Expectation-Maximization (EM) algorithm has been predominantly used ...
research
11/02/2022

Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

This work shows that applying Gradient Descent (GD) with a fixed step si...
research
07/01/2020

Decentralised Learning with Random Features and Distributed Gradient Descent

We investigate the generalisation performance of Distributed Gradient De...
research
05/16/2022

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

Using gradient descent (GD) with fixed or decaying step-size is standard...

Please sign up or login with your details

Forgot password? Click here to reset