Exploiting Correlation in Finite-Armed Structured Bandits

10/18/2018
by   Samarth Gupta, et al.
0

We consider a correlated multi-armed bandit problem in which rewards of arms are correlated through a hidden parameter. Our approach exploits the correlation among arms to identify some arms as sub-optimal and pulls them only O(1) times. This results in significant reduction in cumulative regret, and in fact our algorithm achieves bounded (i.e., O(1)) regret whenever possible; explicit conditions needed for bounded regret to be possible are also provided by analyzing regret lower bounds. We propose several variants of our approach that generalize classical bandit algorithms such as UCB, Thompson sampling, KL-UCB to the structured bandit setting, and empirically demonstrate their superiority via simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2019

Multi-Armed Bandits with Correlated Arms

We consider a multi-armed bandit framework where the rewards obtained by...
research
02/15/2018

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals a...
research
07/07/2020

Optimal Strategies for Graph-Structured Bandits

We study a structured variant of the multi-armed bandit problem specifie...
research
01/25/2019

Almost Boltzmann Exploration

Boltzmann exploration is widely used in reinforcement learning to provid...
research
07/02/2020

Structure Adaptive Algorithms for Stochastic Bandits

We study reward maximisation in a wide class of structured stochastic mu...
research
08/31/2021

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

We consider the query recommendation problem in closed loop interactive ...
research
02/24/2020

Fair Bandit Learning with Delayed Impact of Actions

Algorithmic fairness has been studied mostly in a static setting where t...

Please sign up or login with your details

Forgot password? Click here to reset