DeepAI AI Chat
Log In Sign Up

Adaptive Threshold Sampling and Estimation

by   Daniel Ting, et al.

Sampling is a fundamental problem in both computer science and statistics. A number of issues arise when designing a method based on sampling. These include statistical considerations such as constructing a good sampling design and ensuring there are good, tractable estimators for the quantities of interest as well as computational considerations such as designing fast algorithms for streaming data and ensuring the sample fits within memory constraints. Unfortunately, existing sampling methods are only able to address all of these issues in limited scenarios. We develop a framework that can be used to address these issues in a broad range of scenarios. In particular, it addresses the problem of drawing and using samples under some memory budget constraint. This problem can be challenging since the memory budget forces samples to be drawn non-independently and consequently, makes computation of resulting estimators difficult. At the core of the framework is the notion of a data adaptive thresholding scheme where the threshold effectively allows one to treat the non-independent sample as if it were drawn independently. We provide sufficient conditions for a thresholding scheme to allow this and provide ways to build and compose such schemes. Furthermore, we provide fast algorithms to efficiently sample under these thresholding schemes.


page 1

page 2

page 3

page 4


StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Kernel ridge regression (KRR) is a popular scheme for non-linear non-par...

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

In the fixed budget thresholding bandit problem, an algorithm sequential...

Non-Stochastic CDF Estimation Using Threshold Queries

Estimating the empirical distribution of a scalar-valued data set is a b...

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation ap...

Efficient Nonparametric Smoothness Estimation

Sobolev quantities (norms, inner products, and distances) of probability...

Diversity Promoting Online Sampling for Streaming Video Summarization

Many applications benefit from sampling algorithms where a small number ...

Temporally-Biased Sampling for Online Model Management

To maintain the accuracy of supervised learning models in the presence o...