Natasha 2: Faster Non-Convex Optimization Than SGD

08/29/2017
by   Zeyuan Allen-Zhu, et al.
0

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^-3.25) backpropagations. The best result was essentially O(ε^-4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε^-3.25), with only oracle access to stochastic gradients and Hessian-vector products.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2017

Stochastic Cubic Regularization for Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm---the cu...
research
02/02/2017

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

Given a nonconvex function f(x) that is an average of n smooth functions...
research
11/17/2017

Neon2: Finding Local Minima via First-Order Oracles

We propose a reduction for non-convex optimization that can (1) turn a s...
research
03/25/2018

Minimizing Nonconvex Population Risk from Rough Empirical Risk

Population risk---the expectation of the loss over the sampling mechanis...
research
08/12/2015

Inappropriate use of L-BFGS, Illustrated on frame field design

L-BFGS is a hill climbing method that is guarantied to converge only for...
research
06/14/2023

Noise Stability Optimization for Flat Minima with Optimal Convergence Rates

We consider finding flat, local minimizers by adding average weight pert...
research
11/23/2019

A Stochastic Tensor Method for Non-convex Optimization

We present a stochastic optimization method that uses a fourth-order reg...

Please sign up or login with your details

Forgot password? Click here to reset