Ribbon filter: practically smaller than Bloom and Xor

by   Peter C. Dillinger, et al.

Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate f ∈ (0,1] is known to require ≥log_2(1/f) bits per key. At least for larger f ≥ 2^-4, existing practical filters require a space overhead of at least 20 We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2^-7. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10 Ribbon design with load balancing can even achieve space overheads below 1 A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.



There are no comments yet.


page 2


Prefix Filter: Practically and Theoretically Better Than Bloom

Many applications of approximate membership query data structures, or fi...

Fast Succinct Retrieval and Approximate Membership using Ribbon

A retrieval data structure for a static function f:S→{0,1}^r supports qu...

Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

The Bloom filter provides fast approximate set membership while using li...

On Occupancy Moments and Bloom Filter Efficiency

Two multivariate committee distributions are shown to belong to Berg's f...

Concurrent Expandable AMQs on the Basis of Quotient Filters

A quotient filter is a cache efficient AMQ data structure. Depending on ...

Multiple Set Matching and Pre-Filtering with Bloom Multifilters

Bloom filter is a space-efficient probabilistic data structure for check...

Bloom Multifilters for Multiple Set Matching

Bloom filter is a space-efficient probabilistic data structure for check...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.