Testing frequency distributions in a stream

09/20/2023
by   Claire Mathieu, et al.
0

We study how to verify specific frequency distributions when we observe a stream of N data items taken from a universe of n distinct items. We introduce the relative Fréchet distance to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if f is close to g or if f is far from g with high probability, when f is given and g is defined by a stream. If f is uniform we show a space Ω(n) lower bound. If f decreases fast enough, we then only use space O(log^2 n·loglog n). The analysis relies on the Spacesaving algorithm <cit.> and on sampling the stream.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2020

Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

We introduce difference estimators for data stream computation, which pr...
research
01/13/2023

Streaming Lower Bounds and Asymmetric Set-Disjointness

Frequency estimation in data streams is one of the classical problems in...
research
07/17/2018

Tracking the ℓ_2 Norm with Constant Update Time

The ℓ_2 tracking problem is the task of obtaining a streaming algorithm ...
research
04/03/2020

Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is...
research
10/29/2018

Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming...
research
06/12/2020

Streaming Computations with Region-Based State on SIMD Architectures

Streaming computations on massive data sets are an attractive candidate ...
research
08/26/2021

Truly Perfect Samplers for Data Streams and Sliding Windows

In the G-sampling problem, the goal is to output an index i of a vector ...

Please sign up or login with your details

Forgot password? Click here to reset