SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

01/06/2022
by   Rana Shahout, et al.
0

Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID. Existing quantile sketches are designed for a single number stream (e.g., containing just the latency). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand. We first present a simple sampling algorithm that serves as a benchmark. Then, we design an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee. Finally, we present SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the latencies of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. Our solutions are rigorously analyzed, and we demonstrate the superiority of our approach using extensive simulations.

READ FULL TEXT
research
09/12/2017

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

We introduce and study a new data sketch for processing massive datasets...
research
05/09/2019

Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful...
research
01/10/2022

Bounded Space Differentially Private Quantiles

Estimating the quantiles of a large dataset is a fundamental problem in ...
research
04/08/2019

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distribu...
research
12/07/2021

SpaceSaving^±: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model

In this paper, we propose the first deterministic algorithms to solve th...
research
11/22/2018

Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities over Time

Online monitoring user cardinalities (or degrees) in graph streams is fu...
research
04/21/2015

The adaptable buffer algorithm for high quantile estimation in non-stationary data streams

The need to estimate a particular quantile of a distribution is an impor...

Please sign up or login with your details

Forgot password? Click here to reset