DeepAI AI Chat
Log In Sign Up

Continuous and Discrete-Time Analysis of Stochastic Gradient Descent for Convex and Non-Convex Functions

by   Xavier Fontaine, et al.

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with decreasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) in a weak and strong sense. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time convergence of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which we think are of independent interest. This continuous analysis allows us to develop an intuition on the convergence of SGD and, adapting the technique to the discrete setting, we show that the same results hold to the corresponding sequences. In our analysis, we notably obtain non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite time convergence results under various conditions, including relaxations of the famous Łojasiewicz inequality, which can be applied to a class of non-convex functions.


page 1

page 2

page 3

page 4


Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

The gradient noise of Stochastic Gradient Descent (SGD) is considered to...

Quantitative W_1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise

We prove quantitative convergence rates at which discrete Langevin-like ...

On uniform-in-time diffusion approximation for stochastic gradient descent

The diffusion approximation of stochastic gradient descent (SGD) in curr...

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Sto...

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) has been the method of choice for lear...

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

We study Stochastic Gradient Descent (SGD) with diminishing step sizes f...

Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

While Stochastic Gradient Descent (SGD) is a rather efficient algorithm ...