Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

12/18/2017
by   Yaodong Yu, et al.
0

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specifically, the proposed algorithm only needs Õ(ϵ^-10/3) stochastic gradient evaluations to converge to an approximate local minimum x, which satisfies ∇ f(x)_2≤ϵ and λ_(∇^2 f(x))≥ -√(ϵ) in the general stochastic optimization setting, where Õ(·) hides logarithm polynomial terms and constants. This improves upon the Õ(ϵ^-7/2) gradient complexity achieved by the state-of-the-art stochastic local minima finding algorithms by a factor of Õ(ϵ^-1/6). For nonconvex finite-sum optimization, our algorithm also outperforms the best known algorithms in a certain regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/22/2018

Finding Local Minima via Stochastic Nested Variance Reduction

We propose two algorithms that can find local minima faster than the sta...
10/25/2021

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Escaping from saddle points and finding local minima is a central proble...
12/11/2017

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently

We propose a family of nonconvex optimization algorithms that are able t...
11/17/2017

Neon2: Finding Local Minima via First-Order Oracles

We propose a reduction for non-convex optimization that can (1) turn a s...
06/30/2019

A Note On The Popularity of Stochastic Optimization Algorithms in Different Fields: A Quantitative Analysis from 2007 to 2017

Stochastic optimization algorithms are often used to solve complex large...
03/01/2017

Learning to Optimize Neural Nets

Learning to Optimize is a recently proposed framework for learning optim...
05/22/2017

Follow the Signs for Robust Stochastic Optimization

Stochastic noise on gradients is now a common feature in machine learnin...