Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

by   Vivak Patel, et al.

While Stochastic Gradient Descent (SGD) is a rather efficient algorithm for data-driven problems, it is an incomplete optimization algorithm as it lacks stopping criteria, which has limited its adoption in situations where such criteria are necessary. Unlike stopping criteria for deterministic methods, stopping criteria for SGD require a detailed understanding of (A) strong convergence, (B) whether the criteria will be triggered, (C) how false negatives are controlled, and (D) how false positives are controlled. In order to address these issues, we first prove strong global convergence (i.e., convergence with probability one) of SGD on a popular and general class of convex and nonconvex functions that are specified by, what we call, the Bottou-Curtis-Nocedal structure. Our proof of strong global convergence refines many techniques currently in the literature and employs new ones that are of independent interest. With strong convergence established, we then present several stopping criteria and rigorously explore whether they will be triggered in finite time and supply bounds on false negative probabilities. Ultimately, we lay a foundation for rigorously developing stopping criteria for SGD methods for a broad class of functions, in hopes of making SGD a more complete optimization algorithm with greater adoption for data-driven problems.


page 1

page 2

page 3

page 4


Bounding the expected run-time of nonconvex optimization with early stopping

This work examines the convergence of stochastic gradient-based optimiza...

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Stochastically controlled stochastic gradient (SCSG) methods have been p...

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

We study the iteration complexity of stochastic gradient descent (SGD) f...

Continuous and Discrete-Time Analysis of Stochastic Gradient Descent for Convex and Non-Convex Functions

This paper proposes a thorough theoretical analysis of Stochastic Gradie...

Inference for extreme values under threshold-based stopping rules

There is a propensity for an extreme value analyses to be conducted as a...

Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping

Stochastic Gradient Descent (SGD) has become the method of choice for so...

A Fast, Principled Working Set Algorithm for Exploiting Piecewise Linear Structure in Convex Problems

By reducing optimization to a sequence of smaller subproblems, working s...

Please sign up or login with your details

Forgot password? Click here to reset