DeepAI AI Chat
Log In Sign Up

Bounding the expected run-time of nonconvex optimization with early stopping

02/20/2020
by   Thomas Flynn, et al.
0

This work examines the convergence of stochastic gradient-based optimization algorithms that use early stopping based on a validation function. The form of early stopping we consider is that optimization terminates when the norm of the gradient of a validation function falls below a threshold. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach in the general setting of a first-order optimization algorithm, with possibly biased update directions subject to a geometric drift condition. We then derive bounds on the expected running time for early stopping variants of several algorithms, including stochastic gradient descent (SGD), decentralized SGD (DSGD), and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/09/2020

Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

In this paper, we propose an adaptive stopping rule for kernel-based gra...
04/01/2020

Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

While Stochastic Gradient Descent (SGD) is a rather efficient algorithm ...
03/28/2017

Early Stopping without a Validation Set

Early stopping is a widely used technique to prevent poor generalization...
04/06/2015

Early Stopping is Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted ...
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...
10/07/2021

G̅_mst:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It

-The fluctuation effect of gradient expectation and variance caused by p...
04/23/2013

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

We reconsider the stochastic (sub)gradient approach to the unconstrained...