Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

06/07/2022
by   Rustem Islamov, et al.
0

Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems. In this work, we study ommunication compression and aggregation mechanisms for curvature information in order to reduce these costs while preserving theoretically superior local convergence guarantees. We prove that the recently developed class of three point compressors (3PC) of Richtarik et al. [2022] for gradient communication can be generalized to Hessian communication as well. This result opens up a wide variety of communication strategies, such as contractive compression and lazy aggregation, available to our disposal to compress prohibitively costly curvature information. Moreover, we discovered several new 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, which require reduced communication and occasional Hessian computations. Furthermore, we extend and analyze our approach to bidirectional communication compression and partial device participation setups to cater to the practical considerations of applications in federated learning. For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates. Finally, with extensive numerical evaluations on convex optimization problems, we illustrate that our designed schemes achieve state-of-the-art communication complexity compared to several key baselines using second-order information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2021

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

Recent advances in distributed optimization have shown that Newton-type ...
research
06/05/2021

FedNL: Making Newton-Type Methods Applicable to Federated Learning

Inspired by recent work of Islamov et al (2021), we propose a family of ...
research
02/15/2021

MARINA: Faster Non-Convex Distributed Learning with Compression

We develop and analyze MARINA: a new communication efficient method for ...
research
07/18/2018

Distributed Second-order Convex Optimization

Convex optimization problems arise frequently in diverse machine learnin...
research
02/02/2022

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms fo...
research
10/24/2017

Avoiding Communication in Proximal Methods for Convex Optimization Problems

The fast iterative soft thresholding algorithm (FISTA) is used to solve ...
research
06/01/2022

Stochastic Gradient Methods with Preconditioned Updates

This work considers non-convex finite sum minimization. There are a numb...

Please sign up or login with your details

Forgot password? Click here to reset