DeepAI AI Chat
Log In Sign Up

Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

by   Rahul Meshram, et al.

Restless multi-armed bandits with partially observable states has applications in communication systems, age of information and recommendation systems. In this paper, we study multi-state partially observable restless bandit models. We consider three different models based on information observable to decision maker – 1) no information is observable from actions of a bandit 2) perfect information from bandit is observable only for one action on bandit, there is a fixed restart state, i.e., transition occurs from all other states to that state 3) perfect state information is available to decision maker for both actions on a bandit and there are two restart state for two actions. We develop the structural properties. We also show a threshold type policy and indexability for model 2 and 3. We present Monte Carlo (MC) rollout policy. We use it for whittle index computation in case of model 2. We obtain the concentration bound on value function in terms of horizon length and number of trajectories for MC rollout policy. We derive explicit index formula for model 3. We finally describe Monte Carlo rollout policy for model 1 when it is difficult to show indexability. We demonstrate the numerical examples using myopic policy, Monte Carlo rollout policy and Whittle index policy. We observe that Monte Carlo rollout policy is good competitive policy to myopic.


page 1

page 2

page 3

page 4


Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

We consider multi-dimensional Markov decision processes and formulate a ...

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

We model online recommendation systems using the hidden Markov multi-sta...

Bandits with Partially Observable Offline Data

We study linear contextual bandits with access to a large, partially obs...

Rule-based Shielding for Partially Observable Monte-Carlo Planning

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...

Sequential Monte Carlo Bandits

In this paper we propose a flexible and efficient framework for handling...

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...

Sequential Decision Making under Uncertainty with Dynamic Resource Constraints

This paper studies a class of constrained restless multi-armed bandits. ...