Data stream fusion for accurate quantile tracking and analysis

01/17/2021
by   Massimo Cafaro, et al.
0

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version of DDSKETCH, showing through extensive experimental results that our parallel algorithm almost always outperforms the parallel DDSKETCH algorithm with regard to the overall accuracy in determining the quantiles.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/18/2020

UDDSketch: Accurate Tracking of Quantiles in Data Streams

We present UDDSketch (Uniform DDSketch), a novel sketch for fast and acc...
08/28/2019

DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees

Summary statistics such as the mean and variance are easily maintained f...
05/09/2019

Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful...
12/01/2018

Distributed mining of time--faded heavy hitters

We present P2PTFHH (Peer--to--Peer Time--Faded Heavy Hitters) which, to ...
02/13/2019

Joint Tracking of Multiple Quantiles Through Conditional Quantiles

Estimation of quantiles is one of the most fundamental real-time analysi...
02/11/2019

Computing Extremely Accurate Quantiles Using t-Digests

We present on-line algorithms for computing approximations of rank-based...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.