Natasha 2: Faster Non-Convex Optimization Than SGD

08/29/2017
by   Zeyuan Allen-Zhu, et al.
0

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^-3.25) backpropagations. The best result was essentially O(ε^-4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε^-3.25), with only oracle access to stochastic gradients and Hessian-vector products.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

11/08/2017

Stochastic Cubic Regularization for Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm---the cu...
02/02/2017

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

Given a nonconvex function f(x) that is an average of n smooth functions...
11/17/2017

Neon2: Finding Local Minima via First-Order Oracles

We propose a reduction for non-convex optimization that can (1) turn a s...
08/12/2015

Inappropriate use of L-BFGS, Illustrated on frame field design

L-BFGS is a hill climbing method that is guarantied to converge only for...
03/25/2018

Minimizing Nonconvex Population Risk from Rough Empirical Risk

Population risk---the expectation of the loss over the sampling mechanis...
11/23/2019

A Stochastic Tensor Method for Non-convex Optimization

We present a stochastic optimization method that uses a fourth-order reg...
12/19/2018

Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

Langevin dynamics (LD) has been proven to be a powerful technique for op...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.