Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

10/25/2021
by   Theodoros Mamalis, et al.
0

Large-scale optimization problems require algorithms both effective and efficient. One such popular and proven algorithm is Stochastic Gradient Descent which uses first-order gradient information to solve these problems. This paper studies almost-sure convergence rates of the Stochastic Gradient Descent method when instead of deterministic, its learning rate becomes stochastic. In particular, its learning rate is equipped with a multiplicative stochasticity, producing a stochastic learning rate scheme. Theoretical results show accelerated almost-sure convergence rates of Stochastic Gradient Descent in a nonconvex setting when using an appropriate stochastic learning rate, compared to a deterministic-learning-rate scheme. The theoretical results are verified empirically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2021

Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings

In this work, multiplicative stochasticity is applied to the learning ra...
research
05/23/2018

Predictive Local Smoothness for Stochastic Gradient Methods

Stochastic gradient methods are dominant in nonconvex optimization espec...
research
05/22/2017

Training Deep Networks without Learning Rates Through Coin Betting

Deep learning methods achieve state-of-the-art performance in many appli...
research
10/31/2022

Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

We prove almost sure convergence rates of Zeroth-order Gradient Descent ...
research
02/14/2020

Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

This article suggests that deterministic Gradient Descent, which does no...
research
08/15/2022

Convergence Rates for Stochastic Approximation on a Boundary

We analyze the behavior of projected stochastic gradient descent focusin...
research
09/30/2020

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

We study the role that a finite timescale separation parameter τ has on ...

Please sign up or login with your details

Forgot password? Click here to reset