Truly Perfect Samplers for Data Streams and Sliding Windows

08/26/2021
by   Rajesh Jayaram, et al.
0

In the G-sampling problem, the goal is to output an index i of a vector f ∈ℝ^n, such that for all coordinates j ∈ [n], Pr[i=j] = (1 ±ϵ) G(f_j)/∑_k∈[n] G(f_k) + γ, where G:ℝ→ℝ_≥ 0 is some non-negative function. If ϵ = 0 and γ = 1/poly(n), the sampler is called perfect. In the data stream model, f is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect L_p samplers in turnstile streams, where G(x)=|x|^p, using polylog(n) space for p∈(0,2]. However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise γ = 1/poly(n) close to the true distribution. This small error can be significant when samplers are run many times on successive portions of a stream, and leak potentially sensitive information about the data stream. In this work, we initiate the study of truly perfect samplers, with ϵ = γ = 0, and comprehensively investigate their complexity in the data stream and sliding window models. Abstract truncated due to arXiv limits; please see paper for full abstract.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2018

Perfect L_p Sampling in a Data Stream

In this paper, we resolve the one-pass space complexity of L_p sampling ...
research
11/15/2020

Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

We introduce difference estimators for data stream computation, which pr...
research
03/23/2018

Data Streams with Bounded Deletions

Two prevalent models in the data stream literature are the insertion-onl...
research
03/06/2018

Revisiting Frequency Moment Estimation in Random Order Streams

We revisit one of the classic problems in the data stream literature, na...
research
05/08/2021

Separations for Estimating Large Frequency Moments on Data Streams

We study the classical problem of moment estimation of an underlying vec...
research
09/20/2023

Testing frequency distributions in a stream

We study how to verify specific frequency distributions when we observe ...
research
05/04/2022

A Perfect Sampler for Hypergraph Independent Sets

The problem of uniformly sampling hypergraph independent sets is revisit...

Please sign up or login with your details

Forgot password? Click here to reset