DTN: A Learning Rate Scheme with Convergence Rate of O(1/t) for SGD

01/22/2019
by   Lam M. Nguyen, et al.
0

We propose a novel diminishing learning rate scheme, coined Decreasing-Trend-Nature (DTN), which allows us to prove fast convergence of the Stochastic Gradient Descent (SGD) algorithm to a first-order stationary point for smooth general convex and some class of nonconvex including neural network applications for classification problems. We are the first to prove that SGD with diminishing learning rate achieves a convergence rate of O(1/t) for these problems. Our theory applies to neural network applications for classification problems in a straightforward way.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

On Learning Rates and Schrödinger Operators

The learning rate is perhaps the single most important parameter in the ...
research
02/22/2021

Super-Convergence with an Unstable Learning Rate

Conventional wisdom dictates that learning rate should be in the stable ...
research
10/10/2021

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

Embedding learning has found widespread applications in recommendation s...
research
09/17/2023

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

In this note, we demonstrate a first-of-its-kind provable convergence of...
research
06/04/2015

Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA

We study the problem of recovering the subspace spanned by the first k p...
research
09/27/2022

The Curse of Unrolling: Rate of Differentiating Through Optimization

Computing the Jacobian of the solution of an optimization problem is a c...
research
09/05/2017

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

When applied to training deep neural networks, stochastic gradient desce...

Please sign up or login with your details

Forgot password? Click here to reset