Analysis of Knuth's Sampling Algorithm D and D'

06/08/2023
by   Mridul Nandi, et al.
0

In this research paper, we address the Distinct Elements estimation problem in the context of streaming algorithms. The problem involves estimating the number of distinct elements in a given data stream 𝒜 = (a_1, a_2,…, a_m), where a_i ∈{1, 2, …, n}. Over the past four decades, the Distinct Elements problem has received considerable attention, theoretically and empirically, leading to the development of space-optimal algorithms. A recent sampling-based algorithm proposed by Chakraborty et al.[11] has garnered significant interest and has even attracted the attention of renowned computer scientist Donald E. Knuth, who wrote an article on the same topic [6] and called the algorithm CVM. In this paper, we thoroughly examine the algorithms (referred to as CVM1, CVM2 in [11] and DonD, DonD' in [6]. We first unify all these algorithms and call them cutoff-based algorithms. Then we provide an approximation and biasedness analysis of these algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2023

Distinct Elements in Streams: An Algorithm for the (Text) Book

Given a data stream 𝒟 = ⟨ a_1, a_2, …, a_m ⟩ of m elements where each a_...
research
04/05/2018

Optimal streaming and tracking distinct elements with high probability

The distinct elements problem is one of the fundamental problems in stre...
research
10/29/2018

Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming...
research
03/13/2019

Cardinality Estimation in a Virtualized Network Device Using Online Machine Learning

Cardinality estimation algorithms receive a stream of elements, with pos...
research
09/25/2019

Exact confidence interval for generalized Flajolet-Martin algorithms

This paper develop a deep mathematical-statistical approach to analyze a...
research
06/10/2019

Parallel Streaming Random Sampling

This paper investigates parallel random sampling from a potentially-unen...
research
06/15/2021

Learning-based Support Estimation in Sublinear Time

We consider the problem of estimating the number of distinct elements in...

Please sign up or login with your details

Forgot password? Click here to reset