Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

06/24/2015
by   Mark Braverman, et al.
0

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω({n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2020

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms

We study distributed estimation of a Gaussian mean under communication c...
research
02/23/2018

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

We consider parameter estimation in distributed networks, where each nod...
research
03/06/2021

Over-the-Air Statistical Estimation

We study schemes and lower bounds for distributed minimax statistical es...
research
02/21/2020

Distributed Mean Estimation with Optimal Error Bounds

Motivated by applications to distributed optimization and machine learni...
research
05/21/2020

Communication Complexity of Distributed High Dimensional Correlation Testing

Two parties observe independent copies of a d-dimensional vector and a s...
research
03/09/2022

Correlated quantization for distributed mean estimation and optimization

We study the problem of distributed mean estimation and optimization und...
research
05/27/2019

Scalable K-Medoids via True Error Bound and Familywise Bandits

K-Medoids(KM) is a standard clustering method, used extensively on semi-...

Please sign up or login with your details

Forgot password? Click here to reset