Analysis of Knuth's Sampling Algorithm D and D'

06/08/2023
by   Mridul Nandi, et al.
0

In this research paper, we address the Distinct Elements estimation problem in the context of streaming algorithms. The problem involves estimating the number of distinct elements in a given data stream 𝒜 = (a_1, a_2,…, a_m), where a_i ∈{1, 2, …, n}. Over the past four decades, the Distinct Elements problem has received considerable attention, theoretically and empirically, leading to the development of space-optimal algorithms. A recent sampling-based algorithm proposed by Chakraborty et al.[11] has garnered significant interest and has even attracted the attention of renowned computer scientist Donald E. Knuth, who wrote an article on the same topic [6] and called the algorithm CVM. In this paper, we thoroughly examine the algorithms (referred to as CVM1, CVM2 in [11] and DonD, DonD' in [6]. We first unify all these algorithms and call them cutoff-based algorithms. Then we provide an approximation and biasedness analysis of these algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro