Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

05/10/2023
βˆ™
by   Andre Milzarek, et al.
βˆ™
0
βˆ™

In this paper, we present a novel stochastic normal map-based algorithm (π—‡π—ˆπ—‹π–¬-𝖲𝖦𝖣) for nonconvex composite-type optimization problems and discuss its convergence properties. Using a time window-based strategy, we first analyze the global convergence behavior of π—‡π—ˆπ—‹π–¬-𝖲𝖦𝖣 and it is shown that every accumulation point of the generated sequence of iterates {x^k}_k corresponds to a stationary point almost surely and in an expectation sense. The obtained results hold under standard assumptions and extend the more limited convergence guarantees of the basic proximal stochastic gradient method. In addition, based on the well-known Kurdyka-Łojasiewicz (KL) analysis framework, we provide novel point-wise convergence results for the iterates {x^k}_k and derive convergence rates that depend on the underlying KL exponent ΞΈ and the step size dynamics {Ξ±_k}_k. Specifically, for the popular step size scheme Ξ±_k=π’ͺ(1/k^Ξ³), γ∈ (2/3,1], (almost sure) rates of the form x^k-x^* = π’ͺ(1/k^p), p ∈ (0,1/2), can be established. The obtained rates are faster than related and existing convergence rates for 𝖲𝖦𝖣 and improve on the non-asymptotic complexity bounds for π—‡π—ˆπ—‹π–¬-𝖲𝖦𝖣.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 10/10/2021

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
research
βˆ™ 08/22/2018

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Cubic-regularized Newton's method (CR) is a popular algorithm that guara...
research
βˆ™ 04/29/2019

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...
research
βˆ™ 04/03/2019

Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT

We provide non-asymptotic convergence rates of the Polyak-Ruppert averag...
research
βˆ™ 02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...
research
βˆ™ 02/09/2016

Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods

In this paper, we study the Kurdyka-Łojasiewicz (KL) exponent, an import...
research
βˆ™ 02/10/2019

Deducing Kurdyka-Łojasiewicz exponent via inf-projection

Kurdyka-Łojasiewicz (KL) exponent plays an important role in estimating ...

Please sign up or login with your details

Forgot password? Click here to reset