A Critical Analysis of Classifier Selection in Learned Bloom Filters

11/28/2022
by   Dario Malchiodi, et al.
0

Learned Bloom Filters, i.e., models induced from data via machine learning techniques and solving the approximate set membership problem, have recently been introduced with the aim of enhancing the performance of standard Bloom Filters, with special focus on space occupancy. Unlike in the classical case, the "complexity" of the data used to build the filter might heavily impact on its performance. Therefore, here we propose the first in-depth analysis, to the best of our knowledge, for the performance assessment of a given Learned Bloom Filter, in conjunction with a given classifier, on a dataset of a given classification complexity. Indeed, we propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters in function of specific constraints on their multi-criteria nature (that is, constraints involving space efficiency, false positive rate, and reject time). Our experiments show that the proposed methodology and the supporting software are valid and useful: we find out that only two classifiers have desirable properties in relation to problems with different data complexity, and, interestingly, none of them has been considered so far in the literature. We also experimentally show that the Sandwiched variant of Learned Bloom filters is the most robust to data complexity and classifier performance variability, as well as those usually having smaller reject times. The software can be readily used to test new Learned Bloom Filter proposals, which can be compared with the best ones identified here.

READ FULL TEXT
research
03/05/2018

Optimizing Learned Bloom Filters by Sandwiching

We provide a simple method for improving the performance of the recently...
research
01/03/2019

A Model for Learned Bloom Filters, and Optimizing by Sandwiching

Recent work has suggested enhancing Bloom filters by using a pre-filter,...
research
12/13/2021

On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters

Bloom Filters are a fundamental and pervasive data structure. Within the...
research
01/19/2019

Dynamic Partition Bloom Filters: A Bounded False Positive Solution For Dynamic Set Membership (Extended Abstract)

Dynamic Bloom filters (DBF) were proposed by Guo et. al. in 2010 to tack...
research
10/21/2019

Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier

Recent work suggests improving the performance of Bloom filter by incorp...
research
06/30/2022

Proteus: A Self-Designing Range Filter

We introduce Proteus, a novel self-designing approximate range filter, w...
research
09/24/2020

A Case for Partitioned Bloom Filters

In a partitioned Bloom Filter the m bit vector is split into k disjoint ...

Please sign up or login with your details

Forgot password? Click here to reset