Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

by   Michał Dereziński, et al.

In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.



There are no comments yet.


page 1

page 2

page 3

page 4


SPAN: A Stochastic Projected Approximate Newton Method

Second-order optimization methods have desirable convergence properties....

OverSketched Newton: Fast Convex Optimization for Serverless Systems

Motivated by recent developments in serverless systems for large-scale m...

Newton retraction as approximate geodesics on submanifolds

Efficient approximation of geodesics is crucial for practical algorithms...

A Unifying Framework for Convergence Analysis of Approximate Newton Methods

Many machine learning models are reformulated as optimization problems. ...

Faster Least Squares Optimization

We investigate randomized methods for solving overdetermined linear leas...

Scale Invariant Power Iteration

Power iteration has been generalized to solve many interesting problems ...

Global Convergence of Hessenberg Shifted QR I: Dynamics

Rapid convergence of the shifted QR algorithm on symmetric matrices was ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.