Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

by   Wei-Ning Chen, et al.

We consider the problem of estimating a d-dimensional s-sparse discrete distribution from its samples observed under a b-bit communication constraint. The best-known previous result on ℓ_2 estimation error for this problem is O( slog( d/s)/n2^b). Surprisingly, we show that when sample size n exceeds a minimum threshold n^*(s, d, b), we can achieve an ℓ_2 estimation error of O( s/n2^b). This implies that when n>n^*(s, d, b) the convergence rate does not depend on the ambient dimension d and is the same as knowing the support of the distribution beforehand. We next ask the question: “what is the minimum n^*(s, d, b) that allows dimension-free convergence?”. To upper bound n^*(s, d, b), we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that n^*(s, d, b) = O( min( d^2log^2 d/2^b, s^4log^2 d/2^b) ). Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when n = Ω̃(s^4log^4 d/2^b). This group testing based scheme is adaptive to the sparsity parameter s, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to n^*(s, d, b) = Õ( s^2log^2 d/2^b).



There are no comments yet.


page 1

page 2

page 3

page 4


Pointwise Bounds for Distribution Estimation under Communication Constraints

We consider the problem of estimating a d-dimensional discrete distribut...

Mean Estimation from One-Bit Measurements

We consider the problem of estimating the mean of a symmetric log-concav...

Rate of convergence for geometric inference based on the empirical Christoffel function

We consider the problem of estimating the support of a measure from a fi...

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...

Some effects in adaptive robust estimation under sparsity

Adaptive estimation in the sparse mean model and in sparse regression ex...

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

We consider parameter estimation in distributed networks, where each nod...

Unsupervised Discretization by Two-dimensional MDL-based Histogram

Unsupervised discretization is a crucial step in many knowledge discover...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.