Estimating the Effective Support Size in Constant Query Complexity

11/21/2022
by   Shyam Narayanan, et al.
0

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the ϵ-effective support size Ess_ϵ of a distribution P, which is equal to the smallest support size of a distribution that is ϵ-far in total variation distance from P. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability p(x) for any x) for a bicriteria approximation, giving an answer in [Ess_(1+β)ϵ,(1+γ) Ess_ϵ] for some values β, γ > 0. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio 1+γ = ω(1). He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of n possible for γ>0, but also for γ=0, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity O(1/β^3 ϵ^3). That is, for any 0 < ϵ, β < 1, we output in this complexity a number ñ∈ [Ess_(1+β)ϵ,Ess_ϵ]. We also show that it is possible to solve the approximate version with approximation ratio 1+γ in complexity O(1/β^2 ϵ + 1/βϵγ^2). Our algorithm is very simple, and has 4 short lines of pseudocode.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2019

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Given data drawn from an unknown distribution, D, to what extent is it p...
research
11/08/2019

Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds

Estimating the normalizing constant of an unnormalized probability distr...
research
05/09/2023

On the average-case complexity of learning output distributions of quantum circuits

In this work, we show that learning the output distributions of brickwor...
research
09/24/2019

The Query Complexity of Mastermind with ℓ_p Distances

Consider a variant of the Mastermind game in which queries are ℓ_p dista...
research
08/31/2020

Active Local Learning

In this work we consider active local learning: given a query point x, a...
research
05/09/2023

Description Complexity of Regular Distributions

Myerson's regularity condition of a distribution is a standard assumptio...
research
08/15/2019

Robust estimation of the mean with bounded relative standard deviation

Many randomized approximation algorithms operate by giving a procedure f...

Please sign up or login with your details

Forgot password? Click here to reset