Adaptive Threshold Sampling and Estimation

08/16/2017
by   Daniel Ting, et al.
0

Sampling is a fundamental problem in both computer science and statistics. A number of issues arise when designing a method based on sampling. These include statistical considerations such as constructing a good sampling design and ensuring there are good, tractable estimators for the quantities of interest as well as computational considerations such as designing fast algorithms for streaming data and ensuring the sample fits within memory constraints. Unfortunately, existing sampling methods are only able to address all of these issues in limited scenarios. We develop a framework that can be used to address these issues in a broad range of scenarios. In particular, it addresses the problem of drawing and using samples under some memory budget constraint. This problem can be challenging since the memory budget forces samples to be drawn non-independently and consequently, makes computation of resulting estimators difficult. At the core of the framework is the notion of a data adaptive thresholding scheme where the threshold effectively allows one to treat the non-independent sample as if it were drawn independently. We provide sufficient conditions for a thresholding scheme to allow this and provide ways to build and compose such schemes. Furthermore, we provide fast algorithms to efficiently sample under these thresholding schemes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2021

StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Kernel ridge regression (KRR) is a popular scheme for non-linear non-par...
research
10/18/2021

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

In the fixed budget thresholding bandit problem, an algorithm sequential...
research
01/13/2023

Non-Stochastic CDF Estimation Using Threshold Queries

Estimating the empirical distribution of a scalar-valued data set is a b...
research
02/28/2021

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation ap...
research
05/19/2016

Efficient Nonparametric Smoothness Estimation

Sobolev quantities (norms, inner products, and distances) of probability...
research
04/13/2023

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources

Modern statistical analysis often encounters datasets with large sizes. ...

Please sign up or login with your details

Forgot password? Click here to reset