Cardinality estimation using Gumbel distribution

08/17/2020
by   Aleksander Łukasiewicz, et al.
0

Cardinality estimation is the task of approximating the number of distinct elements in a large dataset with possibly repeating elements. LogLog and HyperLogLog (c.f. Durand and Flajolet [ESA 2003], Flajolet et al. [Discrete Math Theor. 2007]) are small space sketching schemes for cardinality estimation, which have both strong theoretical guarantees of performance and are highly effective in practice. This makes them a highly popular solution with many implementations in big-data systems (e.g. Algebird, Apache DataSketches, BigQuery, Presto and Redis). However, despite having simple and elegant formulation, both the analysis of LogLog and HyperLogLog are extremely involved – spanning over tens of pages of analytic combinatorics and complex function analysis. We propose a modification to both LogLog and HyperLogLog that replaces discrete geometric distribution with a continuous Gumbel distribution. This leads to a very short, simple and elementary analysis of estimation guarantees, and smoother behavior of the estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2020

HyperLogLog (HLL) Security: Inflating Cardinality Estimates

Counting the number of distinct elements on a set is needed in many appl...
research
11/30/2018

Per-Flow Cardinality Estimation Based On Virtual LogLog Sketching

Flow cardinality estimation is the problem of estimating the number of d...
research
03/13/2019

Cardinality Estimation in a Virtualized Network Device Using Online Machine Learning

Cardinality estimation algorithms receive a stream of elements, with pos...
research
08/22/2022

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Cardinality Estimation (aka Distinct Elements) is a classic problem in s...
research
05/23/2018

Construnctions of LOCC indistinguishable set of generalized Bell states

In this paper, we mainly consider the local indistinguishability of the ...
research
08/17/2018

Cardinality Estimators do not Preserve Privacy

Cardinality estimators like HyperLogLog are sketching algorithms that es...
research
01/01/2021

SetSketch: Filling the Gap between MinHash and HyperLogLog

MinHash and HyperLogLog are sketching algorithms that have become indisp...

Please sign up or login with your details

Forgot password? Click here to reset