Data stream fusion for accurate quantile tracking and analysis

by   Massimo Cafaro, et al.

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version of DDSKETCH, showing through extensive experimental results that our parallel algorithm almost always outperforms the parallel DDSKETCH algorithm with regard to the overall accuracy in determining the quantiles.



There are no comments yet.


page 1

page 2

page 3

page 4


UDDSketch: Accurate Tracking of Quantiles in Data Streams

We present UDDSketch (Uniform DDSketch), a novel sketch for fast and acc...

DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees

Summary statistics such as the mean and variance are easily maintained f...

Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful...

Distributed mining of time--faded heavy hitters

We present P2PTFHH (Peer--to--Peer Time--Faded Heavy Hitters) which, to ...

Joint Tracking of Multiple Quantiles Through Conditional Quantiles

Estimation of quantiles is one of the most fundamental real-time analysi...

Computing Extremely Accurate Quantiles Using t-Digests

We present on-line algorithms for computing approximations of rank-based...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.