Distributed Learning and Democratic Embeddings: Polynomial-Time Source Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient Descent under Communication Cons

by   Rajarshi Saha, et al.

In this work, we consider the distributed optimization setting where information exchange between the computation nodes and the parameter server is subject to a maximum bit-budget. We first consider the problem of compressing a vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits per dimension, for which we introduce Democratic and Near-Democratic source-coding schemes. We show that these coding schemes are (near) optimal in the sense that the covering efficiency of the resulting quantizer is either dimension independent, or has a very weak logarithmic dependence. Subsequently, we propose a distributed optimization algorithm: DGD-DEF, which employs our proposed coding strategy, and achieves the minimax optimal convergence rate to within (near) constant factors for a class of communication-constrained distributed optimization algorithms. Furthermore, we extend the utility of our proposed source coding scheme by showing that it can remarkably improve the performance when used in conjunction with other compression schemes. We validate our theoretical claims through numerical simulations. Keywords: Fast democratic (Kashin) embeddings, Distributed optimization, Data-rate constraint, Quantized gradient descent, Error feedback.



page 1

page 2

page 3

page 4


On Distributed Learning with Constant Communication Bits

In this paper, we study a distributed learning problem constrained by co...

Approximate Gradient Coding with Optimal Decoding

In distributed optimization problems, a technique called gradient coding...

Communication-Efficient Distributed Optimization with Quantized Preconditioners

We investigate fast and communication-efficient algorithms for the class...

Don't Fix What ain't Broke: Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Minimax optimization has recently gained a lot of attention as adversari...

Achieving the fundamental convergence-communication tradeoff with Differentially Quantized Gradient Descent

The problem of reducing the communication cost in distributed training t...

Minimax Learning for Remote Prediction

The classical problem of supervised learning is to infer an accurate pre...

Binarized Johnson-Lindenstrauss embeddings

We consider the problem of encoding a set of vectors into a minimal numb...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.