The Adaptive sampling revisited

05/21/2018
by   Guy Louchard, et al.
0

The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R.2^D, where R is the final bucket (cache) size and D is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of W= (R2^D/n) is known, we rederive this distribution in a simpler way. We provide new results on the moments of D and W. We also analyze the final cache size R distribution. We consider colored keys: assume that among the n distinct keys, n_C do have color C. We show how to estimate p=n_C/n. We also study colored keys with some multiplicity given by some distribution function. We want to estimate mean an variance of this distribution. Finally, we consider the case where neither colors nor multiplicities are known. There we want to estimate the related parameters. An appendix is devoted to the case where the hashing function provides bits with probability different from 1/2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2018

Use the Keys Pre-Distribution KDP-scheme for Mandatory Access Control Implementation

The possibility of use the keys preliminary distribution KDP-scheme for ...
research
12/26/2018

Implementation of Simplex Channels in the Blom's Keys Pre-Distribution Scheme

In article the modification of the Blom's keys preliminary distribution ...
research
02/17/2023

Triemaps that match

The trie data structure is a good choice for finite maps whose keys are ...
research
06/15/2020

CoT: Decentralized Elastic Caches for Cloud Environments

Distributed caches are widely deployed to serve social networks and web ...
research
12/05/2018

Approximation with Error Bounds in Spark

We introduce a sampling framework to support approximate computing with ...
research
07/04/2019

Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements modele...
research
12/15/2020

Sorting Lists with Equal Keys Using Mergesort in Linear Time

This article introduces a new optimization method to improve mergesort's...

Please sign up or login with your details

Forgot password? Click here to reset