Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

07/02/2020
by   Michał Dereziński, et al.
1

In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is l_2-regularized with parameter λ and individual machines are each given a sketch of size m, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by λ^'=λ·(1-d_λ/m), where d_λ is the λ-effective dimension of the Hessian (or, for quadratic problems, the data matrix).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2020

Distributed Averaging Methods for Randomized Second Order Optimization

We consider distributed optimization problems where forming the Hessian ...
research
05/28/2019

Distributed estimation of the inverse Hessian by determinantal averaging

In distributed optimization and distributed numerical linear algebra, we...
research
04/20/2022

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

We consider minimizing a smooth and strongly convex objective function u...
research
09/05/2017

A Generic Approach for Escaping Saddle points

A central challenge to using first-order methods for optimizing nonconve...
research
05/22/2016

The De-Biased Whittle Likelihood

The Whittle likelihood is a widely used and computationally efficient ps...
research
03/18/2022

Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds

We consider distributed optimization methods for problems where forming ...
research
02/16/2017

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

We address the statistical and optimization impacts of using classical s...

Please sign up or login with your details

Forgot password? Click here to reset