DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees

08/28/2019
by   Charles Masson, et al.
0

Summary statistics such as the mean and variance are easily maintained for large, distributed data streams, but order statistics (i.e., sample quantiles) can only be approximately summarized. There is extensive literature on maintaining quantile sketches where the emphasis has been on bounding the rank error of the sketch while using little memory. Unfortunately, rank error guarantees do not preclude arbitrarily large relative errors, and this often occurs in practice when the data is heavily skewed. Given the distributed nature of contemporary large-scale systems, another crucial property for quantile sketches is mergeablility, i.e., several combined sketches must be as accurate as a single sketch of the same data. We present the first fully-mergeable, relative-error quantile sketching algorithm with formal guarantees. The sketch is extremely fast and accurate, and is currently being used by Datadog at a wide-scale.

READ FULL TEXT
research
04/18/2020

UDDSketch: Accurate Tracking of Quantiles in Data Streams

We present UDDSketch (Uniform DDSketch), a novel sketch for fast and acc...
research
04/15/2023

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

An ε-approximate quantile sketch over a stream of n inputs approximates ...
research
02/18/2021

Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms

Estimating the distribution and quantiles of data is a foundational task...
research
04/17/2020

A Survey of Approximate Quantile Computation on Large-scale Data (Technical Report)

As data volume grows extensively, data profiling helps to extract metada...
research
01/17/2021

Data stream fusion for accurate quantile tracking and analysis

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in da...
research
03/06/2018

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries

Interactive analytics increasingly involves querying for quantiles over ...
research
08/09/2022

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Today's large-scale services (e.g., video streaming platforms, data cent...

Please sign up or login with your details

Forgot password? Click here to reset