On Occupancy Moments and Bloom Filter Efficiency

08/13/2019
by   Jonathan Burns, et al.
0

Two multivariate committee distributions are shown to belong to Berg's family of factorial series distributions and Kemp's family of generalized hypergeometric factorial moment distributions. Exact moment formulas, upper and lower bounds, and statistical parameter estimators are provided for the classic occupancy and committee distributions. The derived moment equations are used to determine exact formulas for the false-positive rate and efficiency of Bloom filters -- probabilistic data structures used to solve the set membership problem. This study reveals that the conventional Bloom filter analysis overestimates the number of hash functions required to minimize the false-positive rate, and shows that Bloom filter efficiency is monotonic in the number of hash functions.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

05/30/2022

Daisy Bloom Filters

Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) are Bloom filte...
01/18/2018

NAE-SAT-based probabilistic membership filters

Probabilistic membership filters are a type of data structure designed t...
06/06/2021

robustBF: A High Accuracy and Memory Efficient 2D Bloom Filter

Bloom Filter is an important probabilistic data structure to reduce memo...
03/03/2021

Ribbon filter: practically smaller than Bloom and Xor

Filter data structures over-approximate a set of hashable keys, i.e. set...
08/28/2019

Bloom filter variants for multiple sets: a comparative assessment

In this paper we compare two probabilistic data structures for associati...
10/05/2021

Distcomp: Comparing distributions

The distcomp command is introduced and illustrated. The command assesses...
01/07/2019

Multiple Set Matching and Pre-Filtering with Bloom Multifilters

Bloom filter is a space-efficient probabilistic data structure for check...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.