Multi-Armed Bandits with Dependent Arms

10/13/2020
by   Rahul Singh, et al.
7

We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms. More specifically, multiple arms are grouped together to form a cluster, and the reward distributions of arms belonging to the same cluster are known functions of an unknown parameter that is a characteristic of the cluster. Thus, pulling an arm i not only reveals information about its own reward distribution, but also about all those arms that share the same cluster with arm i. This "correlation" amongst the arms complicates the exploration-exploitation trade-off that is encountered in the MABP because the observation dependencies allow us to test simultaneously multiple hypotheses regarding the optimality of an arm. We develop learning algorithms based on the UCB principle which utilize these additional side observations appropriately while performing exploration-exploitation trade-off. We show that the regret of our algorithms grows as O(Klog T), where K is the number of clusters. In contrast, for an algorithm such as the vanilla UCB that is optimal for the classical MABP and does not utilize these dependencies, the regret scales as O(Mlog T) where M is the number of arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

Correlated Multi-armed Bandits with a Latent Random Source

We consider a novel multi-armed bandit framework where the rewards obtai...
research
01/17/2023

Optimal Algorithms for Latent Bandits with Cluster Structure

We consider the problem of latent bandits with cluster structure where t...
research
03/05/2020

Robustness Guarantees for Mode Estimation with an Application to Bandits

Mode estimation is a classical problem in statistics with a wide range o...
research
08/10/2022

Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits

Conducting randomized experiments in education settings raises the quest...
research
01/24/2020

Ballooning Multi-Armed Bandits

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a n...
research
05/04/2018

BelMan: Bayesian Bandits on the Belief--Reward Manifold

We propose a generic, Bayesian, information geometric approach to the ex...
research
06/05/2021

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple...

Please sign up or login with your details

Forgot password? Click here to reset