LocalNewton: Reducing Communication Bottleneck for Distributed Learning

by   Vipul Gupta, et al.

To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say L) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number L of local iterations. We use novel matrix concentration-based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than 60 and workers) and less than 40 state-of-the-art algorithms, to reach the same training loss.


page 1

page 2

page 3

page 4


Distributed Newton Can Communicate Less and Resist Byzantine Workers

We develop a distributed second order optimization algorithm that is com...

Fundamental Limits of Distributed Data Shuffling

Data shuffling of training data among different computing nodes (workers...

Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding

We consider the problem of training a least-squares regression model on ...

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

This work proposes a distributed algorithm for solving empirical risk mi...

Asynchronous Distributed Learning with Sparse Communications and Identification

In this paper, we present an asynchronous optimization algorithm for dis...

Efficient Distributed Learning with Sparsity

We propose a novel, efficient approach for distributed sparse learning i...

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

We present DUAL-LOCO, a communication-efficient algorithm for distribute...