Improved Communication Lower Bounds for Distributed Optimisation

10/16/2020
by   Dan Alistarh, et al.
0

Motivated by the interest in communication-efficient methods for distributed machine learning, we consider the communication complexity of minimising a sum of d-dimensional functions ∑_i = 1^N f_i (x), where each function f_i is held by a one of the N different machines. Such tasks arise naturally in large-scale optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. As our main result, we show that Ω( Nd log d / ε) bits in total need to be communicated between the machines to find an additive ϵ-approximation to the minimum of ∑_i = 1^N f_i (x). The results holds for deterministic algorithms, and randomised algorithms under some restrictions on the parameter values. Importantly, our lower bounds require no assumptions on the structure of the algorithm, and are matched within constant factors for strongly convex objectives by a new variant of quantised gradient descent. The lower bounds are obtained by bringing over tools from communication complexity to distributed optimisation, an approach we hope will find further use in future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

The Minimax Complexity of Distributed Optimization

In this thesis, I study the minimax oracle complexity of distributed sto...
research
08/24/2016

AIDE: Fast and Communication Efficient Distributed Optimization

In this paper, we present two new communication-efficient methods for di...
research
02/05/2015

Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds

We study the following generalized matrix rank estimation problem: given...
research
12/31/2020

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary wor...
research
05/12/2019

Theoretical Limits of One-Shot Distributed Learning

We consider a distributed system of m machines and a server. Each machin...
research
09/08/2019

Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

We study the Convex Set Disjointness (CSD) problem, where two players ha...
research
05/08/2019

Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

We analyse the learning performance of Distributed Gradient Descent in t...

Please sign up or login with your details

Forgot password? Click here to reset