Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

07/30/2021
by   Rahul Meshram, et al.
0

Restless multi-armed bandits with partially observable states has applications in communication systems, age of information and recommendation systems. In this paper, we study multi-state partially observable restless bandit models. We consider three different models based on information observable to decision maker – 1) no information is observable from actions of a bandit 2) perfect information from bandit is observable only for one action on bandit, there is a fixed restart state, i.e., transition occurs from all other states to that state 3) perfect state information is available to decision maker for both actions on a bandit and there are two restart state for two actions. We develop the structural properties. We also show a threshold type policy and indexability for model 2 and 3. We present Monte Carlo (MC) rollout policy. We use it for whittle index computation in case of model 2. We obtain the concentration bound on value function in terms of horizon length and number of trajectories for MC rollout policy. We derive explicit index formula for model 3. We finally describe Monte Carlo rollout policy for model 1 when it is difficult to show indexability. We demonstrate the numerical examples using myopic policy, Monte Carlo rollout policy and Whittle index policy. We observe that Monte Carlo rollout policy is good competitive policy to myopic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2020

Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

We consider multi-dimensional Markov decision processes and formulate a ...
research
02/08/2021

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

We model online recommendation systems using the hidden Markov multi-sta...
research
04/28/2021

Rule-based Shielding for Partially Observable Monte-Carlo Planning

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...
research
10/04/2013

Sequential Monte Carlo Bandits

In this paper we propose a flexible and efficient framework for handling...
research
06/11/2020

Bandits with Partially Observable Offline Data

We study linear contextual bandits with access to a large, partially obs...
research
01/04/2018

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...
research
12/23/2020

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...

Please sign up or login with your details

Forgot password? Click here to reset