Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

05/08/2022
by   Xijin Chen, et al.
0

When using multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, the simplest approach is to ignore missing outcomes and continue to sample following the bandit algorithm. We investigate the impact of missing data on several bandit algorithms via a simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results can apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward (i.e., allocation results). Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the corresponding impact on the performance of multi-armed bandit strategies varies according to their way of balancing the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses, and this arm is perceived as the superior arm by the algorithm. By contrast, algorithms that are geared towards exploitation would do the opposite and not assign samples to the arms with more missing responses. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2022

Thompson Sampling on Asymmetric α-Stable Bandits

In algorithm optimization in reinforcement learning, how to deal with th...
research
02/25/2014

Algorithms for multi-armed bandit problems

Although many algorithms for the multi-armed bandit problem are well-und...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
03/29/2017

Bandit-Based Model Selection for Deformable Object Manipulation

We present a novel approach to deformable object manipulation that does ...
research
06/05/2021

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple...

Please sign up or login with your details

Forgot password? Click here to reset