The Sample Complexity of Search over Multiple Populations

09/06/2012
by   Matthew L. Malloy, et al.
0

This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one such population. We consider two general approaches: non-adaptive sampling methods, which sample each population a predetermined number of times until a population following P1 is found, and adaptive sampling methods, which employ sequential sampling schemes for each population. We first derive a lower bound on the number of samples required by any sampling scheme. We then consider an adaptive procedure consisting of a series of sequential probability ratio tests, and show it comes within a constant factor of the lower bound. We give explicit expressions for this constant when samples of the populations follow Gaussian and Bernoulli distributions. An alternative adaptive scheme is discussed which does not require full knowledge of P1, and comes within a constant factor of the optimal scheme. For comparison, a lower bound on the sampling requirements of any non-adaptive scheme is presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing

We investigate distribution testing with access to non-adaptive conditio...
research
10/07/2015

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

We consider the classical problem of a controller activating (or samplin...
research
04/16/2023

Optimal Sampling for Estimation of Fractional Brownian Motion

In this paper, we focus on multiple sampling problems for the estimation...
research
07/12/2017

Estimating the unseen from multiple populations

Given samples from a distribution, how many new elements should we expec...
research
06/25/2007

Separating populations with wide data: A spectral analysis

In this paper, we consider the problem of partitioning a small data samp...
research
12/17/2019

Mosaic: A Sample-Based Database System for Open World Query Processing

Data scientists have relied on samples to analyze populations of interes...
research
05/19/2023

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

A popular approach for improving the correctness of output from large la...

Please sign up or login with your details

Forgot password? Click here to reset