On Occupancy Moments and Bloom Filter Efficiency

by   Jonathan Burns, et al.

Two multivariate committee distributions are shown to belong to Berg's family of factorial series distributions and Kemp's family of generalized hypergeometric factorial moment distributions. Exact moment formulas, upper and lower bounds, and statistical parameter estimators are provided for the classic occupancy and committee distributions. The derived moment equations are used to determine exact formulas for the false-positive rate and efficiency of Bloom filters -- probabilistic data structures used to solve the set membership problem. This study reveals that the conventional Bloom filter analysis overestimates the number of hash functions required to minimize the false-positive rate, and shows that Bloom filter efficiency is monotonic in the number of hash functions.



page 1

page 2

page 3

page 4


Daisy Bloom Filters

Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) are Bloom filte...

NAE-SAT-based probabilistic membership filters

Probabilistic membership filters are a type of data structure designed t...

robustBF: A High Accuracy and Memory Efficient 2D Bloom Filter

Bloom Filter is an important probabilistic data structure to reduce memo...

Ribbon filter: practically smaller than Bloom and Xor

Filter data structures over-approximate a set of hashable keys, i.e. set...

Bloom filter variants for multiple sets: a comparative assessment

In this paper we compare two probabilistic data structures for associati...

Distcomp: Comparing distributions

The distcomp command is introduced and illustrated. The command assesses...

Multiple Set Matching and Pre-Filtering with Bloom Multifilters

Bloom filter is a space-efficient probabilistic data structure for check...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.