Approaching Optimal Duplicate Detection in a Sliding Window

05/10/2020
by   Rémi Géraud-Stewart, et al.
0

Duplicate detection is the problem of identifying whether a given item has previously appeared in a (possibly infinite) stream of data, when only a limited amount of memory is available. Unfortunately the infinite stream setting is ill-posed, and error rates of duplicate detection filters turn out to be heavily constrained: consequently they appear to provide no advantage, asymptotically, over a biased coin toss [8]. In this paper we formalize the sliding window setting introduced by [13,16], and show that a perfect (zero error) solution can be used up to a maximal window size w_max. Above this threshold we show that some existing duplicate detection filters (designed for the non-windowed setting) perform better that those targeting the windowed problem. Finally, we introduce a "queuing construction" that improves on the performance of some duplicate detection filters in the windowed setting. We also analyse the security of our filters in an adversarial setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2021

k-Center Clustering with Outliers in the Sliding-Window Model

The k-center problem for a point set P asks for a collection of k congru...
research
03/15/2021

Smoothness of Schatten Norms and Sliding-Window Matrix Streams

Large matrices are often accessed as a row-order stream. We consider the...
research
09/23/2019

Sliding window property testing for regular languages

We study the problem of recognizing regular languages in a variant of th...
research
12/05/2017

Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free

For many networking applications, recent data is more significant than o...
research
02/21/2018

Randomized sliding window algorithms for regular languages

A sliding window algorithm receives a stream of symbols and has to outpu...
research
01/09/2020

Age-Partitioned Bloom Filters

Bloom filters (BF) are widely used for approximate membership queries ov...
research
11/07/2017

SWOOP: Top-k Similarity Joins over Set Streams

We provide efficient support for applications that aim to continuously f...

Please sign up or login with your details

Forgot password? Click here to reset