Multiple Set Matching and Pre-Filtering with Bloom Multifilters

01/07/2019
by   Francesco Concas, et al.
0

Bloom filter is a space-efficient probabilistic data structure for checking elements' membership in a set. Given multiple sets, however, a standard Bloom filter is not sufficient when looking for the items to which an element or a set of input elements belong to. In this article, we solve multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector. Both of them are space efficient and answer queries with a set of identifiers for multiple set matching problems. We show that the space efficiency can be optimized further according to the distribution of labels among multiple sets: Uniform and Zipf. While both of them are space efficient, Bloom Vector can efficiently exploit Zipf distribution of data for further space reduction. Our results also highlight that basic ADD and LOOKUP operations on Bloom Matrix are faster than on Bloom Vector. However, Bloom Matrix does not meet the theoretical false positive rate of less than 10^-2 for LOOKUP operations if the represented data or the labels are not uniformly distributed among the multiple sets. Consequently, we introduce Bloom Test which uses Bloom Matrix as the pre-filter structure to determine which structure is suitable for improved performance with an arbitrary input dataset.

READ FULL TEXT

page 8

page 12

research
01/07/2019

Bloom Multifilters for Multiple Set Matching

Bloom filter is a space-efficient probabilistic data structure for check...
research
10/17/2019

The Distributed Bloom Filter

The Distributed Bloom Filter is a space-efficient, probabilistic data st...
research
08/28/2019

Bloom filter variants for multiple sets: a comparative assessment

In this paper we compare two probabilistic data structures for associati...
research
06/13/2021

Hash Adaptive Bloom Filter

Bloom filter is a compact memory-efficient probabilistic data structure ...
research
12/16/2019

Matrix Bloom Filter: An Efficient Probabilistic Data Structure for 2-tuple Batch Lookup

With the growing scale of big data, probabilistic structures receive inc...
research
06/11/2023

Time-limited Bloom Filter

A Bloom Filter is a probabilistic data structure designed to check, rapi...
research
09/24/2020

A Case for Partitioned Bloom Filters

In a partitioned Bloom Filter the m bit vector is split into k disjoint ...

Please sign up or login with your details

Forgot password? Click here to reset