# Estimating the Effective Support Size in Constant Query Complexity

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the ϵ-effective support size Ess_ϵ of a distribution P, which is equal to the smallest support size of a distribution that is ϵ-far in total variation distance from P. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability p(x) for any x) for a bicriteria approximation, giving an answer in [Ess_(1+β)ϵ,(1+γ) Ess_ϵ] for some values β, γ > 0. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio 1+γ = ω(1). He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of n possible for γ>0, but also for γ=0, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity O(1/β^3 ϵ^3). That is, for any 0 < ϵ, β < 1, we output in this complexity a number ñ∈ [Ess_(1+β)ϵ,Ess_ϵ]. We also show that it is possible to solve the approximate version with approximation ratio 1+γ in complexity O(1/β^2 ϵ + 1/βϵγ^2). Our algorithm is very simple, and has 4 short lines of pseudocode.

READ FULL TEXT