Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

10/11/2019
by   Si Yi Meng, et al.
11

We consider stochastic second order methods for minimizing strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized sub-sampled Newton method (R-SSN) achieves global linear convergence with an adaptive step size and a constant batch size. By growing the batch size for both the sub-sampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyse stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and consider real medium-size datasets under a kernel mapping. Our experimental results show the fast convergence of these methods both in terms of the number of iterations and wall-clock time.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

06/11/2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...
12/03/2019

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

We present two new remarkably simple stochastic second-order methods for...
01/18/2016

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization p...
05/09/2015

Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic Convergence

We propose a randomized second-order method for optimization known as th...
06/03/2020

On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

Following early work on Hessian-free methods for deep learning, we study...
09/28/2020

Escaping Saddle-Points Faster under Interpolation-like Conditions

In this paper, we show that under over-parametrization several standard ...
05/16/2017

Sub-sampled Cubic Regularization for Non-convex Optimization

We consider the minimization of non-convex functions that typically aris...

Code Repositories

ssn

Official code for Stochastic Second Order Methods under Interpolation paper


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.