Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

09/14/2020
by   Arghyadip Roy, et al.
0

In the regret-based formulation of multi-armed bandit (MAB) problems, except in rare instances, much of the literature focuses on arms with i.i.d. rewards. In this paper, we consider the problem of obtaining regret guarantees for MAB problems in which the rewards of each arm form a Markov chain which may not belong to a single parameter exponential family. To achieve logarithmic regret in such problems is not difficult: a variation of standard KL-UCB does the job. However, the constants obtained from such an analysis are poor for the following reason: i.i.d. rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i.i.d. To overcome this issue, we introduce a novel algorithm that identifies whether the rewards from each arm are truly Markovian or i.i.d. using a Hellinger distance-based test. Our algorithm then switches from using a standard KL-UCB to a specialized version of KL-UCB when it determines that the arm reward is Markovian, thus resulting in low regret for both i.i.d. and Markovian settings.

READ FULL TEXT
research
04/28/2023

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

We study K-armed bandit problems where the reward distributions of the a...
research
12/17/2021

Learning in Restless Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
07/20/2020

Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits

We consider a policy gradient algorithm applied to a finite-arm bandit p...
research
06/30/2016

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

We study a generalization of the multi-armed bandit problem with multipl...
research
03/19/2017

Bernoulli Rank-1 Bandits for Click Feedback

The probability that a user will click a search result depends both on i...
research
04/16/2018

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

In this work, we address the open problem of finding low-complexity near...
research
09/28/2021

The Fragility of Optimized Bandit Algorithms

Much of the literature on optimal design of bandit algorithms is based o...

Please sign up or login with your details

Forgot password? Click here to reset