Towards Optimal Moment Estimation in Streaming and Distributed Models

07/12/2019
by   Rajesh Jayaram, et al.
0

One of the oldest problems in the data stream model is to approximate the p-th moment X_p^p = ∑_i=1^n |X_i|^p of an underlying vector X∈R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p ∈ (0,2]. Although a tight space bound of Θ(ϵ^-2 n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity when all updates are positive. Specifically, the upper bound is O(ϵ^-2 n) bits, while the lower bound is only Ω(ϵ^-2 + n) bits. Recently, an upper bound of Õ(ϵ^-2 + n) bits was obtained assuming that the updates arrive in a random order. We show that for p ∈ (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of Õ(ϵ^-2 + n) bits for estimating X_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p ∈ (1,2], in the natural coordinator and blackboard communication topologies, there is an Õ(ϵ^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an Õ(ϵ^2 d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Ω(ϵ^-2 n) bit lower bound for p ∈ (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2018

Revisiting Frequency Moment Estimation in Random Order Streams

We revisit one of the classic problems in the data stream literature, na...
research
07/17/2022

Streaming Algorithms with Large Approximation Factors

We initiate a broad study of classical problems in the streaming model w...
research
05/17/2019

Separating k-Player from t-Player One-Way Communication, with Applications to Data Streams

In a k-party communication problem, the k players with inputs x_1, x_2, ...
research
02/05/2015

Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds

We study the following generalized matrix rank estimation problem: given...
research
05/08/2021

Separations for Estimating Large Frequency Moments on Data Streams

We study the classical problem of moment estimation of an underlying vec...
research
08/16/2018

Perfect L_p Sampling in a Data Stream

In this paper, we resolve the one-pass space complexity of L_p sampling ...
research
04/11/2019

Tight Bounds for the Subspace Sketch Problem with Applications

In the subspace sketch problem one is given an n× d matrix A with O((nd)...

Please sign up or login with your details

Forgot password? Click here to reset