Bias Reduction for Sum Estimation

08/02/2022
by   Talya Eden, et al.
0

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution P, and that when an element x is sampled, the probability P of sampling x is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution Q that is sufficiently close to P. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution Q sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of N real values x_1, …, x_N. This problem is well-studied in the statistical literature in the case P = Q, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution P, values are sampled from a distribution Q that is pointwise close to P. For every positive integer k we define an estimator ζ_k for μ = ∑_i x_i whose bias is proportional to γ^k (where our ζ_1 reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if Q is pointwise γ-close to uniform and all x_i ∈{0, 1}, for any ϵ > 0, we can estimate μ to within additive error ϵ N using m = Θ(N^1-1/k / ϵ^2/k) samples, where k = ⌈ (logϵ)/(logγ)⌉. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter ϵ: for some values of ϵ, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

Estimation of Entropy in Constant Space with Improved Sample Complexity

Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the ...
research
06/10/2019

The Broad Optimality of Profile Maximum Likelihood

We study three fundamental statistical-learning problems: distribution e...
research
09/10/2021

PAC Mode Estimation using PPR Martingale Confidence Sequences

We consider the problem of correctly identifying the mode of a discrete ...
research
07/06/2019

Testing Mixtures of Discrete Distributions

There has been significant study on the sample complexity of testing pro...
research
06/03/2019

Optimal Learning of Mallows Block Model

The Mallows model, introduced in the seminal paper of Mallows 1957, is o...
research
06/21/2022

Sharp Constants in Uniformity Testing via the Huber Statistic

Uniformity testing is one of the most well-studied problems in property ...
research
12/10/2021

Collecting Coupons is Faster with Friends

In this note, we introduce a distributed twist on the classic coupon col...

Please sign up or login with your details

Forgot password? Click here to reset