Towards Optimal Moment Estimation in Streaming and Distributed Models
One of the oldest problems in the data stream model is to approximate the p-th moment X_p^p = ∑_i=1^n |X_i|^p of an underlying vector X∈R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p ∈ (0,2]. Although a tight space bound of Θ(ϵ^-2 n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity when all updates are positive. Specifically, the upper bound is O(ϵ^-2 n) bits, while the lower bound is only Ω(ϵ^-2 + n) bits. Recently, an upper bound of Õ(ϵ^-2 + n) bits was obtained assuming that the updates arrive in a random order. We show that for p ∈ (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of Õ(ϵ^-2 + n) bits for estimating X_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p ∈ (1,2], in the natural coordinator and blackboard communication topologies, there is an Õ(ϵ^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an Õ(ϵ^2 d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Ω(ϵ^-2 n) bit lower bound for p ∈ (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.
READ FULL TEXT