Multiset Synchronization with Counting Cuckoo Filters

03/08/2020
by   Shangsen Li, et al.
0

Set synchronization is a fundamental task in distributed applications and implementations. Existing methods that synchronize simple sets are mainly based on compact data structures such as Bloom filter and its variants. However, these methods are infeasible to synchronize a pair of multisets which allow an element to appear for multiple times. To this end, in this paper, we propose to leverage the counting cuckoo filter (CCF), a novel variant of cuckoo filter, to represent and thereafter synchronize a pair of multisets. The cuckoo filter (CF) is a minimized hash table that uses cuckoo hashing to resolve collisions. CF has an array of buckets, each of which has multiple slots to store element fingerprints. Based on CF, CCF extends each slot as two fields, the fingerprint field and the counter field. The fingerprint field records the fingerprint of element which is stored by this slot; while the counter field counts the multiplicity of the stored element. With such a design, CCF is competent to represent any multiset. After generating and exchanging the respective CCFs which represent the local multi-sets, we propose the query-based and the decoding-based methods to identify the different elements between the given multisets. The comprehensive evaluation results indicate that CCF outperforms the counting Bloom filter (CBF) when they are used to synchronize multisets, in terms of both synchronization accuracy and the space-efficiency, at the cost of a little higher time-consumption.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

In-stream Probabilistic Cardinality Estimation for Bloom Filters

The amount of data coming from different sources such as IoT-sensors, so...
research
08/28/2019

Bloom filter variants for multiple sets: a comparative assessment

In this paper we compare two probabilistic data structures for associati...
research
06/06/2021

countBF: A General-purpose High Accuracy and Space Efficient Counting Bloom Filter

Bloom Filter is a probabilistic data structure for the membership query,...
research
08/01/2012

Don't Thrash: How to Cache Your Hash on Flash

This paper presents new alternatives to the well-known Bloom filter data...
research
11/19/2019

Concurrent Expandable AMQs on the Basis of Quotient Filters

A quotient filter is a cache efficient AMQ data structure. Depending on ...
research
01/29/2023

Fast Correlation Function Calculator – A high-performance pair counting toolkit

Context. A novel high-performance exact pair counting toolkit called Fas...
research
03/01/1998

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

This paper introduces new algorithms and data structures for quick count...

Please sign up or login with your details

Forgot password? Click here to reset