Stretching Your Data With Taffy Filters

09/04/2021
by   Jim Apple, et al.
0

Popular approximate membership query structures such as Bloom filters and cuckoo filters are widely used in databases, security, and networking. These structures represent sets approximately, and support at least two operations - insert and lookup; lookup always returns true on elements inserted into the structure; it also returns true with some probability 0 < ε < 1 on elements not inserted into the structure. These latter elements are called false positives. Compensatory for these false positives, filters can be much smaller than hash tables that represent the same set. However, unlike hash tables, cuckoo filters and Bloom filters must be initialized with the intended number of inserts to be performed, and cannot grow larger - inserts beyond this number fail or significantly increase the false positive probability. This paper presents designs and implementations of filters than can grow without inserts failing and without meaningfully increasing the false positive probability, even if the filters are created with a small initial size. The resulting code is available on GitHub under a permissive open source license.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2021

Approximate Membership Query Filters with a False Positive Free Set

In the last decade, significant efforts have been made to reduce the fal...
research
09/24/2020

A Case for Partitioned Bloom Filters

In a partitioned Bloom Filter the m bit vector is split into k disjoint ...
research
09/14/2022

Adversarial Correctness and Privacy for Probabilistic Data Structures

We study the security of Probabilistic Data Structures (PDS) for handlin...
research
03/04/2021

GAssert: A Fully Automated Tool to Improve Assertion Oracles

This demo presents the implementation and usage details of GASSERT, the ...
research
04/28/2020

Certifying Certainty and Uncertainty in Approximate Membership Query Structures – Extended Version

Approximate Membership Query structures (AMQs) rely on randomisation for...
research
05/11/2022

Raw Filtering of JSON Data on FPGAs

Many Big Data applications include the processing of data streams on sem...
research
06/12/2019

An Effective Payload Attribution Scheme for Cybercriminal Detection Using Compressed Bitmap Index Tables and Traffic Downsampling

Payload attribution systems (PAS) are one of the most important tools of...

Please sign up or login with your details

Forgot password? Click here to reset