Perfect L_p Sampling in a Data Stream
In this paper, we resolve the one-pass space complexity of L_p sampling for p ∈ (0,2). Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector f ∈R^n, a perfect L_p sampler must output an index i with probability |f_i|^p/f_p^p, and is allowed to fail with some probability δ. So far, for p > 0 no algorithm has been shown to solve the problem exactly using poly( n)-bits of space. In 2010, Monemizadeh and Woodruff introduced an approximate L_p sampler, which outputs i with probability (1 ±ν)|f_i|^p /f_p^p, using space polynomial in ν^-1 and (n). The space complexity was later reduced by Jowhari, Sağlam, and Tardos to roughly O(ν^-p^2 n δ^-1) for p ∈ (0,2), which tightly matches the Ω(^2 n δ^-1) lower bound in terms of n and δ, but is loose in terms of ν. Given these nearly tight bounds, it is perhaps surprising that no lower bound at all exists in terms of ν---not even a bound of Ω(ν^-1) is known. In this paper, we explain this phenomenon by demonstrating the existence of an O(^2 n δ^-1)-bit perfect L_p sampler for p ∈ (0,2). This shows that ν need not factor into the space of an L_p sampler, which completely closes the complexity of the problem for this range of p. For p=2, our bound is O(^3 n δ^-1)-bits, which matches the prior best known upper bound of O(ν^-2^3n δ^-1), but has no dependence on ν. Finally, we show improved upper and lower bounds for returning a (1±ϵ) relative error estimate of the frequency f_i of the sampled index i.
READ FULL TEXT