Count-min sketch with variable number of hash functions: an experimental study

02/10/2023
by   Éric Fusy, et al.
0

Conservative Count-Min, an improved version of Count-Min sketch [Cormode, Muthukrishnan 2005], is an online-maintained hashing-based data structure summarizing element frequency information without storing elements themselves. Although several works attempted to analyze the error that can be made by Count-Min, the behavior of this data structure remains poorly understood. In [Fusy, Kucherov 2022], we demonstrated that under the uniform distribution of input elements, the error of conservative Count-Min follows two distinct regimes depending on its load factor. In this work, we provide a series of experimental results providing new insights into the behavior of conservative Count-Min. Our contributions can be seen as twofold. On one hand, we provide a detailed experimental analysis of the behavior of Count-Min sketch in different regimes and under several representative probability distributions of input elements. On the other hand, we demonstrate improvements that can be made by assigning a variable number of hash functions to different elements. This includes, in particular, reduced space of the data structure while still supporting a small error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Analysis of Count-Min sketch under conservative update

Count-Min sketch is a hash-based data structure to represent a dynamical...
research
08/14/2019

(Learned) Frequency Estimation Algorithms under Zipfian Distribution

The frequencies of the elements in a data stream are an important statis...
research
04/27/2018

Buffered Count-Min Sketch on SSD: Theory and Experiments

Frequency estimation data structures such as the count-min sketch (CMS) ...
research
11/06/2021

Frequency Estimation with One-Sided Error

Frequency estimation is one of the most fundamental problems in streamin...
research
07/04/2022

Learning state machines via efficient hashing of future traces

State machines are popular models to model and visualize discrete system...
research
03/28/2022

A Formal Analysis of the Count-Min Sketch with Conservative Updates

Count-Min Sketch with Conservative Updates (CMS-CU) is a popular algorit...
research
12/01/2018

Distributed mining of time--faded heavy hitters

We present P2PTFHH (Peer--to--Peer Time--Faded Heavy Hitters) which, to ...

Please sign up or login with your details

Forgot password? Click here to reset