Quotient Hash Tables - Efficiently Detecting Duplicates in Streaming Data

01/14/2019
by   Rémi Géraud, et al.
0

This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce an optimised version of our new data structure dubbed Queued QHT with Duplicates (QQHTD). Finally we discuss the effect of adversarial inputs for hash-based duplicate filters similar to QHT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2023

DiCuPIT: Distributed Cuckoo Filter-based Pending Interest Table

Named data networking is one of the recommended architectures for the fu...
research
05/23/2018

Optimal Hashing in External Memory

Hash tables are a ubiquitous class of dictionary data structures. Howeve...
research
04/19/2023

Efficient implementation of sets and multisets in R using hash tables

The package hset for the R language contains an implementation of a S4 c...
research
07/05/2019

HashGraph – Scalable Hash Tables Using A Sparse Graph Data Structure

Hash tables are ubiquitous and used in a wide range of applications for ...
research
01/24/2019

Dolha - an Efficient and Exact Data Structure for Streaming Graphs

A streaming graph is a graph formed by a sequence of incoming edges with...
research
05/19/2018

Do You Like What I Like? Similarity Estimation in Proximity-based Mobile Social Networks

While existing social networking services tend to connect people who kno...
research
04/17/2020

Reducing Commutativity Verification to Reachability with Differencing Abstractions

Commutativity of data structure methods is of ongoing interest, with roo...

Please sign up or login with your details

Forgot password? Click here to reset