DeepAI AI Chat
Log In Sign Up

Estimating the Effective Support Size in Constant Query Complexity

by   Shyam Narayanan, et al.

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the ϵ-effective support size Ess_ϵ of a distribution P, which is equal to the smallest support size of a distribution that is ϵ-far in total variation distance from P. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability p(x) for any x) for a bicriteria approximation, giving an answer in [Ess_(1+β)ϵ,(1+γ) Ess_ϵ] for some values β, γ > 0. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio 1+γ = ω(1). He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of n possible for γ>0, but also for γ=0, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity O(1/β^3 ϵ^3). That is, for any 0 < ϵ, β < 1, we output in this complexity a number ñ∈ [Ess_(1+β)ϵ,Ess_ϵ]. We also show that it is possible to solve the approximate version with approximation ratio 1+γ in complexity O(1/β^2 ϵ + 1/βϵγ^2). Our algorithm is very simple, and has 4 short lines of pseudocode.


page 1

page 2

page 3

page 4


Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Given data drawn from an unknown distribution, D, to what extent is it p...

Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds

Estimating the normalizing constant of an unnormalized probability distr...

The Query Complexity of Mastermind with ℓ_p Distances

Consider a variant of the Mastermind game in which queries are ℓ_p dista...

Active Local Learning

In this work we consider active local learning: given a query point x, a...

Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel

For the problem of maximizing a monotone, submodular function with respe...

The Power of Many Samples in Query Complexity

The randomized query complexity R(f) of a boolean function f{0,1}^n→{0,1...

Robust estimation of the mean with bounded relative standard deviation

Many randomized approximation algorithms operate by giving a procedure f...