Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

12/16/2011
by   Antoine Salomon, et al.
0

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existence of logarithmic bound in the general case of Hannan consistency. To get these results, we study variants of popular Upper Confidence Bounds (ucb) policies. As a by-product, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2013

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the va...
research
05/12/2018

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

In this paper we consider the dynamic assortment selection problem under...
research
07/20/2020

Filtered Poisson Process Bandit on a Continuum

We consider a version of the continuum armed bandit where an action indu...
research
05/12/2015

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

The purpose of this paper is to provide further understanding into the s...
research
06/22/2020

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

We study regret minimization in a stochastic multi-armed bandit setting ...
research
10/04/2018

Adaptive Policies for Perimeter Surveillance Problems

Maximising the detection of intrusions is a fundamental and often critic...
research
04/26/2017

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

We study the stochastic multi-armed bandit (MAB) problem in the presence...

Please sign up or login with your details

Forgot password? Click here to reset