An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

05/08/2015
by   Wesley Cowan, et al.
0

Consider the problem of a controller sampling sequentially from a finite number of N ≥ 2 populations, specified by random variables X^i_k, i = 1,... , N, and k = 1, 2, ...; where X^i_k denotes the outcome from population i the k^th time it is sampled. It is assumed that for each fixed i, { X^i_k }_k ≥ 1 is a sequence of i.i.d. uniform random variables over some interval [a_i, b_i], with the support (i.e., a_i, b_i) unknown to the controller. The objective is to have a policy π for deciding, based on available data, from which of the N populations to sample from at any time n=1,2,... so as to maximize the expected sum of outcomes of n samples or equivalently to minimize the regret due to lack on information of the parameters { a_i } and { b_i }. In this paper, we present a simple inflated sample mean (ISM) type policy that is asymptotically optimal in the sense of its regret achieving the asymptotic lower bound of Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2015

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of N ...
research
10/07/2015

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

We consider the classical problem of a controller activating (or samplin...
research
09/09/2015

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (M...
research
02/28/2023

Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

Thompson sampling (TS) for the parametric stochastic multi-armed bandits...
research
11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...
research
08/04/2021

Regret Analysis of Learning-Based MPC with Partially-Unknown Cost Function

The exploration/exploitation trade-off is an inherent challenge in data-...
research
01/19/2012

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of i...

Please sign up or login with your details

Forgot password? Click here to reset