Daisy Bloom Filters

05/30/2022
by   Ioana O. Bercea, et al.
0

Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) are Bloom filters that adapt the number of hash functions according to the query element. That is, they use a sequence of hash functions h_1, h_2, … and insert x by setting the bits in k_x positions h_1(x), h_2(x), …, h_k_x(x) to 1, where the parameter k_x depends on x. Similarly, a query for x checks whether the bits at positions h_1(x), h_2(x), …, h_k_x(x) contain a 0 (in which case we know that x was not inserted), or contains only 1s (in which case x may have been inserted, but it could also be a false positive). In this paper, we determine a near-optimal choice of the parameters k_x in a model where n elements are inserted independently from a probability distribution 𝒫 and query elements are chosen from a probability distribution 𝒬, under a bound on the false positive probability F. In contrast, the parameter choice of Bruck et al., as well as follow-up work by Wang et al., does not guarantee a nontrivial bound on the false positive rate. We refer to our parameterization of the weighted Bloom filter as a Daisy Bloom filter. For many distributions 𝒫 and 𝒬, the Daisy Bloom filter space usage is significantly smaller than that of Standard Bloom filters. Our upper bound is complemented with an information-theoretical lower bound, showing that (with mild restrictions on the distributions 𝒫 and 𝒬), the space usage of Daisy Bloom filters is the best possible up to a constant factor. Daisy Bloom filters can be seen as a fine-grained variant of a recent data structure of Vaidya, Knorr, Mitzenmacher and Kraska. Like their work, we are motivated by settings in which we have prior knowledge of the workload of the filter, possibly in the form of advice from a machine learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2021

Telescoping Filter: A Practical Adaptive Filter

Filters are fast, small and approximate set membership data structures. ...
research
08/13/2019

On Occupancy Moments and Bloom Filter Efficiency

Two multivariate committee distributions are shown to belong to Berg's f...
research
04/26/2020

Succinct Filters for Sets of Unknown Sizes

The membership problem asks to maintain a set S⊆[u], supporting insertio...
research
01/19/2019

Dynamic Partition Bloom Filters: A Bounded False Positive Solution For Dynamic Set Membership (Extended Abstract)

Dynamic Bloom filters (DBF) were proposed by Guo et. al. in 2010 to tack...
research
09/24/2020

A Case for Partitioned Bloom Filters

In a partitioned Bloom Filter the m bit vector is split into k disjoint ...
research
06/06/2021

robustBF: A High Accuracy and Memory Efficient 2D Bloom Filter

Bloom Filter is an important probabilistic data structure to reduce memo...
research
05/23/2019

COBS: a Compact Bit-Sliced Signature Index

We present COBS, a compact bit-sliced signature index, which is a cross-...

Please sign up or login with your details

Forgot password? Click here to reset