Ribbon filter: practically smaller than Bloom and Xor

03/03/2021
by   Peter C. Dillinger, et al.
0

Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate f ∈ (0,1] is known to require ≥log_2(1/f) bits per key. At least for larger f ≥ 2^-4, existing practical filters require a space overhead of at least 20 We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2^-7. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10 Ribbon design with load balancing can even achieve space overheads below 1 A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.

READ FULL TEXT
research
03/31/2022

Prefix Filter: Practically and Theoretically Better Than Bloom

Many applications of approximate membership query data structures, or fi...
research
09/04/2021

Fast Succinct Retrieval and Approximate Membership using Ribbon

A retrieval data structure for a static function f:S→{0,1}^r supports qu...
research
06/30/2022

Proteus: A Self-Designing Range Filter

We introduce Proteus, a novel self-designing approximate range filter, w...
research
08/13/2019

On Occupancy Moments and Bloom Filter Efficiency

Two multivariate committee distributions are shown to belong to Berg's f...
research
11/19/2019

Concurrent Expandable AMQs on the Basis of Quotient Filters

A quotient filter is a cache efficient AMQ data structure. Depending on ...
research
09/14/2022

Adversarial Correctness and Privacy for Probabilistic Data Structures

We study the security of Probabilistic Data Structures (PDS) for handlin...
research
01/07/2019

Bloom Multifilters for Multiple Set Matching

Bloom filter is a space-efficient probabilistic data structure for check...

Please sign up or login with your details

Forgot password? Click here to reset