Matrix Bloom Filter: An Efficient Probabilistic Data Structure for 2-tuple Batch Lookup

12/16/2019
by   Yue Fu, et al.
0

With the growing scale of big data, probabilistic structures receive increasing popularity for efficient approximate storage and query processing. For example, Bloom filters (BF) can achieve satisfactory performance for approximate membership existence query at the expense of false positives. However, a standard Bloom filter can only handle univariate data and single membership existence query, which is insufficient for OLAP and machine learning applications. In this paper, we focus on a common multivariate data type, namely, 2-tuples, or equivalently, key-value pairs. We design the matrix Bloom filter as a high-dimensional extension of the standard Bloom filter. This new probabilistic data structure can not only insert and lookup a single 2-tuple efficiently, but also support these operations efficiently in batches — a key requirement for OLAP and machine learning tasks. To further balance the insertion and query efficiency for different workload patterns, we propose two variants, namely, the maximum adaptive matrix BF and minimum storage matrix BF. Through both theoretical and empirical studies, we show the performance of matrix Bloom filter is superior on datasets with common statistical distributions; and even without them, it just degrades to a standard Bloom filter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2019

Shed More Light on Bloom Filter's Variants

Bloom Filter is a probabilistic membership data structure and it is exce...
research
01/18/2018

NAE-SAT-based probabilistic membership filters

Probabilistic membership filters are a type of data structure designed t...
research
10/07/2019

RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time

Approximate set membership is a common problem with wide applications in...
research
06/27/2020

Optimizing Cuckoo Filter for high burst tolerance,low latency, and high throughput

In this paper, we present an implementation of a cuckoo filter for membe...
research
05/11/2022

Raw Filtering of JSON Data on FPGAs

Many Big Data applications include the processing of data streams on sem...
research
01/07/2019

Multiple Set Matching and Pre-Filtering with Bloom Multifilters

Bloom filter is a space-efficient probabilistic data structure for check...
research
10/15/2018

Preventing DDoS using Bloom Filter: A Survey

Distributed Denial-of-Service (DDoS) is a menace for service provider an...

Please sign up or login with your details

Forgot password? Click here to reset