Relative Error Streaming Quantiles

04/03/2020
by   Graham Cormode, et al.
0

Approximating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of n items from a data universe U (equipped with a total order), the task is to compute a sketch (data structure) of size poly(log(n), 1/ε). Given the sketch and a query item y ∈U, one should be able to approximate its rank in the stream, i.e., the number of stream elements smaller than y. Most works to date focused on additive ε n error approximation, culminating in the KLL sketch that achieved optimal asymptotic behavior. This paper investigates multiplicative (1±ε)-error approximations to the rank. The motivation stems from practical demand to understand the tails of distributions, and hence for sketches to be more accurate near extreme values. The most space-efficient algorithms that can be derived from prior work store either O(log(ε^2 n)/ε^2) or O(log^3(ε n)/ε) universe items. This paper presents a sketch of size O(log^1.5(ε n)/ε) (ignoring poly(loglog n, log(1/ε)) factors) that achieves a 1±ε multiplicative error guarantee, without prior knowledge of the stream length or dependence on the size of the data universe. This is within a O(√(log(ε n))) factor of optimal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2022

The ℓ_p-Subspace Sketch Problem in Small Dimensions with Applications to Support Vector Machines

In the ℓ_p-subspace sketch problem, we are given an n× d matrix A with n...
research
07/14/2023

Differentially Private Clustering in Data Streams

The streaming model is an abstraction of computing over massive data str...
research
06/29/2019

Streaming Quantiles Algorithms with Small Space and Update Time

Approximating quantiles and distributions over streaming data has been s...
research
06/10/2021

Small space and streaming pattern matching with k edits

In this work, we revisit the fundamental and well-studied problem of app...
research
09/20/2023

Testing frequency distributions in a stream

We study how to verify specific frequency distributions when we observe ...
research
04/15/2023

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

An ε-approximate quantile sketch over a stream of n inputs approximates ...
research
03/06/2022

Coresets for Data Discretization and Sine Wave Fitting

In the monitoring problem, the input is an unbounded stream P=p_1,p_2⋯ o...

Please sign up or login with your details

Forgot password? Click here to reset