Exact PPS Sampling with Bounded Sample Size

05/22/2021
by   Brian Hentschel, et al.
0

Probability proportional to size (PPS) sampling schemes with a target sample size aim to produce a sample comprising a specified number n of items while ensuring that each item in the population appears in the sample with a probability proportional to its specified "weight" (also called its "size"). These two objectives, however, cannot always be achieved simultaneously. Existing PPS schemes prioritize control of the sample size, violating the PPS property if necessary. We provide a new PPS scheme that allows a different trade-off: our method enforces the PPS property at all times while ensuring that the sample size never exceeds the target value n. The sample size is exactly equal to n if possible, and otherwise has maximal expected value and minimal variance. Thus we bound the sample size, thereby avoiding storage overflows and helping to control the time required for analytics over the sample, while allowing the user complete control over the sample contents. The method is both simple to implement and efficient, being a one-pass streaming algorithm with an amortized processing time of O(1) per item.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

Temporally-Biased Sampling Schemes for Online Model Management

To maintain the accuracy of supervised learning models in the presence o...
research
01/29/2018

Temporally-Biased Sampling for Online Model Management

To maintain the accuracy of supervised learning models in the presence o...
research
02/15/2016

Adversarial Top-K Ranking

We study the top-K ranking problem where the goal is to recover the set ...
research
10/28/2022

The non-significance factor is a simple posterior estimate of the minimum necessary sample size

A researcher is interested in what sample size is needed to get the requ...
research
08/12/2019

Blending of Probability and Non-Probability Samples: Applications to a Survey of Military Caregivers

Probability samples are the preferred method for providing inferences th...
research
10/11/2018

Analysis of Noisy Evolutionary Optimization When Sampling Fails

In noisy evolutionary optimization, sampling is a common strategy to dea...
research
08/02/2022

Revisiting sample size planning for receiver operating characteristic studies: a confidence interval approach with precision and assurance

Objectives: Estimation of areas under receiver operating characteristic ...

Please sign up or login with your details

Forgot password? Click here to reset