A Distributed Second-Order Algorithm You Can Trust

06/20/2018
by   Celestine Dünner, et al.
2

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the computation and the communication of a large Hessian matrix. In this paper we present a new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers. To deal with this approximate information we propose an adaptive approach that - akin to trust-region methods - dynamically adapts the auxiliary model to compensate for modeling errors. We provide theoretical rates of convergence for a wide class of problems including L1-regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

HesScale: Scalable Computation of Hessian Diagonals

Second-order optimization uses curvature information about the objective...
research
06/27/2012

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distribute...
research
06/01/2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

We introduce AdaHessian, a second order stochastic optimization algorith...
research
05/22/2019

Ellipsoidal Trust Region Methods and the Marginal Value of Hessian Information for Neural Network Training

We investigate the use of ellipsoidal trust region constraints for secon...
research
02/16/2020

Distributed Averaging Methods for Randomized Second Order Optimization

We consider distributed optimization problems where forming the Hessian ...
research
09/01/2022

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

While variance reduction methods have shown great success in solving lar...
research
03/27/2013

Efficiently Using Second Order Information in Large l1 Regularization Problems

We propose a novel general algorithm LHAC that efficiently uses second-o...

Please sign up or login with your details

Forgot password? Click here to reset