WOR and p's: Sketches for ℓ_p-Sampling Without Replacement

07/14/2020
by   Edith Cohen, et al.
2

Weighted sampling is a fundamental tool in data analysis and machine learning pipelines. Samples are used for efficient estimation of statistics or as sparse representations of the data. When weight distributions are skewed, as is often the case in practice, without-replacement (WOR) sampling is much more effective than with-replacement (WR) sampling: it provides a broader representation and higher accuracy for the same number of samples. We design novel composable sketches for WOR ℓ_p sampling, weighted sampling of keys according to a power p∈[0,2] of their frequency (or for signed data, sum of updates). Our sketches have size that grows only linearly with the sample size. Our design is simple and practical, despite intricate analysis, and based on off-the-shelf use of widely implemented heavy hitters sketches such as CountSketch. Our method is the first to provide WOR sampling in the important regime of p>1 and the first to handle signed updates for p>0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2021

Simple, Optimal Algorithms for Random Sampling Without Replacement

Consider the fundamental problem of drawing a simple random sample of si...
research
05/08/2023

Risk-limiting Financial Audits via Weighted Sampling without Replacement

We introduce the notion of a risk-limiting financial auditing (RLFA): gi...
research
07/04/2019

Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements modele...
research
02/21/2020

Incremental Sampling Without Replacement for Sequence Models

Sampling is a fundamental technique, and sampling without replacement is...
research
10/24/2020

On Testing of Samplers

Given a set of items ℱ and a weight function 𝚠𝚝: ℱ↦ (0,1), the problem o...
research
06/19/2023

INC: A Scalable Incremental Weighted Sampler

The fundamental problem of weighted sampling involves sampling of satisf...
research
03/14/2019

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

The well-known Gumbel-Max trick for sampling from a categorical distribu...

Please sign up or login with your details

Forgot password? Click here to reset