RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time
Approximate set membership is a common problem with wide applications in databases, networking, and search. Given a set S and a query q, the task is to determine whether q in S. The Bloom Filter (BF) is a popular data structure for approximate membership testing due to its simplicity. In particular, a BF consists of a bit array that can be incrementally updated. A related problem concerning this paper is the Multiple Set Membership Testing (MSMT) problem. Here we are given K different sets, and for any given query q the goal is the find all of the sets containing the query element. Trivially, a multiple set membership instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time. A simple array of Bloom Filters can achieve that. In this paper, we show the first non-trivial data-structure for streaming keys, RAMBO (Repeated And Merged Bloom Filter) that achieves expected O(sqrt(K) logK) query time with an additional worst case memory cost factor of O(logK) than the array of Bloom Filters. The proposed data-structure is simply a count-min sketch arrangement of Bloom Filters and retains all its favorable properties. We replace the addition operation with a set union and the minimum operation with a set intersection during estimation.
READ FULL TEXT