Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

by   Si Yi Meng, et al.

We consider stochastic second order methods for minimizing strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized sub-sampled Newton method (R-SSN) achieves global linear convergence with an adaptive step size and a constant batch size. By growing the batch size for both the sub-sampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyse stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and consider real medium-size datasets under a kernel mapping. Our experimental results show the fast convergence of these methods both in terms of the number of iterations and wall-clock time.



There are no comments yet.


page 7


Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

We present two new remarkably simple stochastic second-order methods for...

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization p...

Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic Convergence

We propose a randomized second-order method for optimization known as th...

On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

Following early work on Hessian-free methods for deep learning, we study...

Escaping Saddle-Points Faster under Interpolation-like Conditions

In this paper, we show that under over-parametrization several standard ...

Sub-sampled Cubic Regularization for Non-convex Optimization

We consider the minimization of non-convex functions that typically aris...

Code Repositories


Official code for Stochastic Second Order Methods under Interpolation paper

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.