PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

07/06/2023
by   Keqin Liu, et al.
0

In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Niño-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2021

Whittle Index for A Class of Restless Bandits with Imperfect Observations

We consider a class of restless bandit problems that finds a broad appli...
research
02/12/2021

Uncertainty-of-Information Scheduling: A Restless Multi-armed Bandit Framework

This paper proposes using the uncertainty of information (UoI), measured...
research
01/05/2020

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

This paper develops a Hoeffding inequality for the partial sums ∑_k=1^n ...
research
10/14/2020

Asymptotic Randomised Control with applications to bandits

We consider a general multi-armed bandit problem with correlated (and si...
research
12/06/2022

An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources

This paper focuses on the information freshness of finite-state Markov s...
research
07/13/2021

Markov Game with Switching Costs

We study a general Markov game with metric switching costs: in each roun...
research
04/29/2020

Whittle index based Q-learning for restless bandits with average reward

A novel reinforcement learning algorithm is introduced for multiarmed re...

Please sign up or login with your details

Forgot password? Click here to reset