Natasha 2: Faster Non-Convex Optimization Than SGD

08/29/2017 ∙ by Zeyuan Allen-Zhu, et al. ∙ 0

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^-3.25) backpropagations. The best result was essentially O(ε^-4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε^-3.25), with only oracle access to stochastic gradients and Hessian-vector products.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.