Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

10/07/2015
by   Wesley Cowan, et al.
0

We consider the classical problem of a controller activating (or sampling) sequentially from a finite number of N ≥ 2 populations, specified by unknown distributions. Over some time horizon, at each time n = 1, 2, ..., the controller wishes to select a population to sample, with the goal of sampling from a population that optimizes some "score" function of its distribution, e.g., maximizing the expected sum of outcomes or minimizing variability. We define a class of Uniformly Fast (UF) sampling policies and show, under mild regularity conditions, that there is an asymptotic lower bound for the expected total number of sub-optimal population activations. Then, we provide sufficient conditions under which a UCB policy is UF and asymptotically optimal, since it attains this lower bound. Explicit solutions are provided for a number of examples of interest, including general score functionals on unconstrained Pareto distributions (of potentially infinite mean), and uniform distributions of unknown support. Additional results on bandits of Normal distributions are also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2015

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Consider the problem of a controller sampling sequentially from a finite...
research
09/09/2015

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (M...
research
09/06/2012

The Sample Complexity of Search over Multiple Populations

This paper studies the sample complexity of searching over multiple popu...
research
01/19/2012

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of i...
research
04/22/2015

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of N ...
research
02/28/2023

Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

Thompson sampling (TS) for the parametric stochastic multi-armed bandits...
research
08/12/2012

How to sample if you must: on optimal functional sampling

We examine a fundamental problem that models various active sampling set...

Please sign up or login with your details

Forgot password? Click here to reset