Truly Perfect Samplers for Data Streams and Sliding Windows
In the G-sampling problem, the goal is to output an index i of a vector f ∈ℝ^n, such that for all coordinates j ∈ [n], Pr[i=j] = (1 ±ϵ) G(f_j)/∑_k∈[n] G(f_k) + γ, where G:ℝ→ℝ_≥ 0 is some non-negative function. If ϵ = 0 and γ = 1/poly(n), the sampler is called perfect. In the data stream model, f is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect L_p samplers in turnstile streams, where G(x)=|x|^p, using polylog(n) space for p∈(0,2]. However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise γ = 1/poly(n) close to the true distribution. This small error can be significant when samplers are run many times on successive portions of a stream, and leak potentially sensitive information about the data stream. In this work, we initiate the study of truly perfect samplers, with ϵ = γ = 0, and comprehensively investigate their complexity in the data stream and sliding window models. Abstract truncated due to arXiv limits; please see paper for full abstract.
READ FULL TEXT