A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

03/10/2023
by   Dorian Baudry, et al.
0

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2017

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adv...
research
10/27/2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration

In this paper we propose the first multi-armed bandit algorithm based on...
research
11/18/2021

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

The stochastic multi-arm bandit problem has been extensively studied und...
research
06/13/2022

Top Two Algorithms Revisited

Top Two algorithms arose as an adaptation of Thompson sampling to best a...
research
05/24/2017

Boundary Crossing Probabilities for General Exponential Families

We consider parametric exponential families of dimension K on the real l...
research
09/12/2023

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Thompson sampling (TS) is one of the most popular and earliest algorithm...
research
06/11/2020

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

We investigate stochastic combinatorial multi-armed bandit with semi-ban...

Please sign up or login with your details

Forgot password? Click here to reset