DeepAI AI Chat
Log In Sign Up

Bounding the expected run-time of nonconvex optimization with early stopping

by   Thomas Flynn, et al.

This work examines the convergence of stochastic gradient-based optimization algorithms that use early stopping based on a validation function. The form of early stopping we consider is that optimization terminates when the norm of the gradient of a validation function falls below a threshold. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach in the general setting of a first-order optimization algorithm, with possibly biased update directions subject to a geometric drift condition. We then derive bounds on the expected running time for early stopping variants of several algorithms, including stochastic gradient descent (SGD), decentralized SGD (DSGD), and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.


page 1

page 2

page 3

page 4


Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

In this paper, we propose an adaptive stopping rule for kernel-based gra...

Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

While Stochastic Gradient Descent (SGD) is a rather efficient algorithm ...

Early Stopping without a Validation Set

Early stopping is a widely used technique to prevent poor generalization...

Early Stopping is Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted ...

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...

G̅_mst:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It

-The fluctuation effect of gradient expectation and variance caused by p...

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

We reconsider the stochastic (sub)gradient approach to the unconstrained...