Streaming Lower Bounds and Asymmetric Set-Disjointness

01/13/2023
by   Shachar Lovett, et al.
0

Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the space complexity of the algorithm, when the data streams are adversarial. However, in the case where the data stream is given in a random order, or is stochastic, only weaker lower bounds exist. In this work we close this gap, up to logarithmic factors. In order to do so we consider the needle problem, which is a natural hard problem for frequency estimation studied in (Andoni et al. 2008, Crouch et al. 2016). Here, the goal is to distinguish between two distributions over data streams with t samples. The first is uniform over a large enough domain. The second is a planted model; a secret ”needle” is uniformly chosen, and then each element in the stream equals the needle with probability p, and otherwise is uniformly chosen from the domain. It is simple to design streaming algorithms that distinguish the distributions using space s ≈ 1/(p^2 t). It was unclear if this is tight, as the existing lower bounds are weaker. We close this gap and show that the trade-off is near optimal, up to a logarithmic factor. Our proof builds and extends classical connections between streaming algorithms and communication complexity, concretely multi-party unique set-disjointness. We introduce two new ingredients that allow us to prove sharp bounds. The first is a lower bound for an asymmetric version of multi-party unique set-disjointness, where players receive input sets of different sizes, and where the communication of each player is normalized relative to their input length. The second is a combinatorial technique that allows to sample needles in the planted model by first sampling intervals, and then sampling a uniform needle in each interval.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2020

Optimal Lower Bounds for Matching and Vertex Cover in Dynamic Graph Streams

In this paper, we give simple optimal lower bounds on the one-way two-pa...
research
02/28/2020

Two Player Hidden Pointer Chasing and Multi-Pass Lower Bounds in Turnstile Streams

The authors have withdrawn this paper due to an error in the proof of Le...
research
03/23/2018

Data Streams with Bounded Deletions

Two prevalent models in the data stream literature are the insertion-onl...
research
09/20/2023

Testing frequency distributions in a stream

We study how to verify specific frequency distributions when we observe ...
research
02/17/2020

Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich's PRG

In this work, we establish lower-bounds against memory bounded algorithm...
research
05/24/2021

A Simple Proof of a New Set Disjointness with Applications to Data Streams

The multiplayer promise set disjointness is one of the most widely used ...
research
08/08/2017

Extractor-Based Time-Space Lower Bounds for Learning

A matrix M: A × X →{-1,1} corresponds to the following learning problem:...

Please sign up or login with your details

Forgot password? Click here to reset