On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits

07/20/2016
by   James Edwards, et al.
0

The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems but has recently been adapted for use in online decision making in general and multi-armed bandit problems (MABs) in particular. We study its use in a class of exponential family MABs and identify weaknesses, including a propensity to take actions which are dominated with respect to both exploitation and exploration. We propose variants of KG which avoid such errors. These new policies include an index heuristic which deploys a KG approach to develop an approximation to the Gittins index. A numerical study shows this policy to perform well over a range of MABs including those for which index policies are not optimal. While KG does not make dominated actions when bandits are Gaussian, it fails to be index consistent and appears not to enjoy a performance advantage over competitor policies when arms are correlated to compensate for its greater computational demands.

READ FULL TEXT

page 20

page 21

research
07/25/2020

Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

We consider multi-dimensional Markov decision processes and formulate a ...
research
10/14/2020

Asymptotic Randomised Control with applications to bandits

We consider a general multi-armed bandit problem with correlated (and si...
research
07/28/2020

A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach

Testing is an important part of tackling the COVID-19 pandemic. Availabi...
research
04/09/2012

Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits

In budget-limited multi-armed bandit (MAB) problems, the learner's actio...
research
01/04/2018

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...
research
10/31/2022

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

We study the problem of planning restless multi-armed bandits (RMABs) wi...
research
10/19/2020

DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees

Automating physical database design has remained a long-term interest in...

Please sign up or login with your details

Forgot password? Click here to reset