Algorithms for multi-armed bandit problems

02/25/2014
by   Volodymyr Kuleshov, et al.
0

Although many algorithms for the multi-armed bandit problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as epsilon-greedy and Boltzmann exploration outperform theoretically sound algorithms on most settings by a significant margin. Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identifies for each algorithm the settings where it performs well, and the settings where it performs poorly. Thirdly, the algorithms' performance relative each to other is affected only by the number of bandit arms and the variance of the rewards. This finding may guide the design of subsequent empirical evaluations. In the second part of the paper, we turn our attention to an important area of application of bandit algorithms: clinical trials. Although the design of clinical trials has been one of the principal practical problems motivating research on multi-armed bandits, bandit algorithms have never been evaluated as potential treatment allocation strategies. Using data from a real study, we simulate the outcome that a 2001-2002 clinical trial would have had if bandit algorithms had been used to allocate patients to treatments. We find that an adaptive trial would have successfully treated at least 50 effects and increasing patient retention. At the end of the trial, the best treatment could have still been identified with a high level of statistical confidence. Our findings demonstrate that bandit algorithms are attractive alternatives to current adaptive treatment allocation strategies.

READ FULL TEXT

page 11

page 12

research
01/20/2023

Multi armed bandits and quantum channel oracles

Multi armed bandits are one of the theoretical pillars of reinforcement ...
research
11/03/2019

Bayesian adaptive N-of-1 trials for estimating population and individual treatment effects

This article presents a novel adaptive design algorithm that can be used...
research
05/19/2022

Adaptive Experiments and a Rigorous Framework for Type I Error Verification and Computational Experiment Design

This PhD thesis covers breakthroughs in several areas of adaptive experi...
research
05/08/2022

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

When using multi-armed bandit algorithms, the potential impact of missin...
research
03/04/2019

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

The stochastic multi-armed bandit problem is a well-known model for stud...
research
01/04/2021

Etat de l'art sur l'application des bandits multi-bras

The Multi-armed bandit offer the advantage to learn and exploit the alre...
research
01/03/2023

Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Designing experiments often requires balancing between learning about th...

Please sign up or login with your details

Forgot password? Click here to reset