Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

05/12/2015
by   Wesley Cowan, et al.
0

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies π. For any slowly increasing function g, subject to mild regularity constraints, we construct two policies (the g-Forcing, and the g-Inflated Sample Mean) that achieve a measure of regret of order O(g(n)) almost surely as n →∞, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function g effectively controls the "exploration" of the classical "exploration/exploitation" tradeoff.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of s...
research
02/21/2020

Double Explore-then-Commit: Asymptotic Optimality and Beyond

We study the two-armed bandit problem with subGaussian rewards. The expl...
research
06/28/2019

Adaptive Sequential Experiments with Unknown Information Flows

Systems that make sequential decisions in the presence of partial feedba...
research
01/06/2016

On Bayesian index policies for sequential resource allocation

This paper is about index policies for minimizing (frequentist) regret i...
research
06/14/2022

On the Finite-Time Performance of the Knowledge Gradient Algorithm

The knowledge gradient (KG) algorithm is a popular and effective algorit...
research
01/29/2020

Functional Sequential Treatment Allocation with Covariates

We consider a multi-armed bandit problem with covariates. Given a realiz...
research
04/26/2017

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

We study the stochastic multi-armed bandit (MAB) problem in the presence...

Please sign up or login with your details

Forgot password? Click here to reset