Whittle Index for A Class of Restless Bandits with Imperfect Observations

08/09/2021
by   Keqin Liu, et al.
0

We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. In our model, there are N independent 2-state Markov processes that may be observed and accessed for accruing rewards. The observation is error-prone, i.e., both false alarm and miss detection may happen. Furthermore, the user can only choose a subset of M (M<N) processes to observe at each discrete time. If a process in state 1 is correctly observed, then it will offer some reward. Due to the partial and imperfect observation model, the system is formulated as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with finite state spaces are PSPACE-HARD in general. In this paper, we establish a low-complexity algorithm that achieves a strong performance for this class of restless bandits. Under certain conditions, we theoretically prove the existence (indexability) of Whittle index and its equivalence to our algorithm. When those conditions do not hold, we show by numerical experiments the near-optimal performance of our algorithm in general.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2023

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

In this paper, we consider a general observation model for restless mult...
research
02/12/2021

Uncertainty-of-Information Scheduling: A Restless Multi-armed Bandit Framework

This paper proposes using the uncertainty of information (UoI), measured...
research
01/05/2020

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

This paper develops a Hoeffding inequality for the partial sums ∑_k=1^n ...
research
02/15/2023

Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation

This paper proposes a new algorithm, referred to as GMAB, that combines ...
research
01/26/2020

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime-s...
research
05/12/2018

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

In this paper we consider the dynamic assortment selection problem under...
research
07/12/2019

Gittins' theorem under uncertainty

We study dynamic allocation problems for discrete time multi-armed bandi...

Please sign up or login with your details

Forgot password? Click here to reset