Communication-Efficient Distributed Optimization with Quantized Preconditioners

by   Foivos Alimisis, et al.

We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among n different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have linear dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton's method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.



There are no comments yet.


page 1

page 2

page 3

page 4


Achieving the fundamental convergence-communication tradeoff with Differentially Quantized Gradient Descent

The problem of reducing the communication cost in distributed training t...

Finite-Bit Quantization For Distributed Algorithms With Linear Convergence

This paper studies distributed algorithms for (strongly convex) composit...

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimizat...

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

The present paper develops a novel aggregated gradient approach for dist...

Quantized Frank-Wolfe: Communication-Efficient Distributed Optimization

How can we efficiently mitigate the overhead of gradient communications ...

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets

We study distributed optimization algorithms for minimizing the average ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.