DeepAI AI Chat
Log In Sign Up

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

by   Yunwen Lei, et al.
Southern University of Science & Technology
Wuhan University

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.


page 1

page 2

page 3

page 4


Convergence rates for the stochastic gradient descent method for non-convex objective functions

We prove the local convergence to minima and estimates on the rate of co...

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

We study Stochastic Gradient Descent (SGD) with diminishing step sizes f...

Learning Rates for Nonconvex Pairwise Learning

Pairwise learning is receiving increasing attention since it covers many...

Self-learn to Explain Siamese Networks Robustly

Learning to compare two objects are essential in applications, such as d...

On Learning Rates and Schrödinger Operators

The learning rate is perhaps the single most important parameter in the ...

Non-Convergence and Limit Cycles in the Adam optimizer

One of the most popular training algorithms for deep neural networks is ...

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

In this paper, we propose a novel stochastic gradient estimator—ProbAbil...