Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

04/20/2022
by   Sen Na, et al.
0

We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as the Subsampled Newton and Newton Sketch, which can efficiently construct stochastic Hessian estimates for many tasks. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme enjoys local Q-superlinear convergence with a non-asymptotic rate of (Υ√(log (t)/t) )^t, where Υ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the iteration, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2019

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

We present two new remarkably simple stochastic second-order methods for...
research
03/30/2020

Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

In this paper, we study the non-asymptotic superlinear convergence rate ...
research
07/15/2021

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

In second-order optimization, a potential bottleneck can be computing th...
research
11/19/2020

On the asymptotic rate of convergence of Stochastic Newton algorithms and their Weighted Averaged versions

The majority of machine learning methods can be regarded as the minimiza...
research
05/23/2019

Scale Invariant Power Iteration

Power iteration has been generalized to solve many interesting problems ...
research
07/02/2020

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

In distributed second order optimization, a standard strategy is to aver...
research
05/28/2019

Distributed estimation of the inverse Hessian by determinantal averaging

In distributed optimization and distributed numerical linear algebra, we...

Please sign up or login with your details

Forgot password? Click here to reset