Don't Thrash: How to Cache Your Hash on Flash

08/01/2012
by   Michael A. Bender, et al.
0

This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent. This paper first describes the quotient filter, which supports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient filter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged. The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages and thus serve as SSD-optimized alternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient filter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient filter performed insertions 8.6-11 times faster than the fastest Bloom filter variant and performed lookups 0.94-2.56 times faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

Prefix Filter: Practically and Theoretically Better Than Bloom

Many applications of approximate membership query data structures, or fi...
research
08/28/2020

Cache-Efficient Sweeping-Based Interval Joins for Extended Allen Relation Predicates (Extended Version)

We develop a family of efficient plane-sweeping interval join algorithms...
research
09/04/2021

Fast Succinct Retrieval and Approximate Membership using Ribbon

A retrieval data structure for a static function f:S→{0,1}^r supports qu...
research
10/30/2019

Jointly optimal dereverberation and beamforming

We previously proposed an optimal (in the maximum likelihood sense) conv...
research
03/08/2020

Multiset Synchronization with Counting Cuckoo Filters

Set synchronization is a fundamental task in distributed applications an...
research
03/15/2019

scaleBF: A High Scalable Membership Filter using 3D Bloom Filter

Bloom Filter is extensively deployed data structure in various applicati...
research
11/29/2019

Efficient method for parallel computation of geodesic transformation on CPU

This paper introduces a fast Central Processing Unit (CPU) implementatio...

Please sign up or login with your details

Forgot password? Click here to reset