Distinct Elements in Streams: An Algorithm for the (Text) Book

01/24/2023
āˆ™
by   Sourav Chakraborty, et al.
āˆ™
0
āˆ™

Given a data stream š’Ÿ = āŸØ a_1, a_2, ā€¦, a_m āŸ© of m elements where each a_i āˆˆ [n], the Distinct Elements problem is to estimate the number of distinct elements in š’Ÿ. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it. All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
āˆ™ 06/08/2023

Analysis of Knuth's Sampling Algorithm D and D'

In this research paper, we address the Distinct Elements estimation prob...
research
āˆ™ 12/21/2021

Optimal Gap Sequences in Shellsort for nā‰¤16 Elements

Optimal gap sequences in Shellsort, defined as gap sequences having the ...
research
āˆ™ 04/05/2018

Optimal streaming and tracking distinct elements with high probability

The distinct elements problem is one of the fundamental problems in stre...
research
āˆ™ 06/11/2021

ExtendedHyperLogLog: Analysis of a new Cardinality Estimator

We discuss the problem of counting distinct elements in a stream. A stre...
research
āˆ™ 10/29/2018

Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming...
research
āˆ™ 02/21/2019

With Great Speed Come Small Buffers: Space-Bandwidth Tradeoffs for Routing

We consider the Adversarial Queuing Theory (AQT) model, where packet arr...
research
āˆ™ 06/17/2020

Ranking and benchmarking framework for sampling algorithms on synthetic data streams

In the fields of big data, AI, and streaming processing, we work with la...

Please sign up or login with your details

Forgot password? Click here to reset