Sequential Decision Making under Uncertainty with Dynamic Resource Constraints

by   Kesav Kaza, et al.

This paper studies a class of constrained restless multi-armed bandits. The constraints are in the form of time varying availability of arms. This variation can be either stochastic or semi-deterministic. A fixed number of arms can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current states of arms are partially observable through binary feedback signals from arms that are played. The current availability of arms is fully observable. The objective is to maximize long term cumulative reward. The uncertainty about future availability of arms along with partial state information makes this objective challenging. This optimization problem is analyzed using Whittle's index policy. To this end, a constrained restless single-armed bandit is studied. It is shown to admit a threshold-type optimal policy, and is also indexable. An algorithm to compute Whittle's index is presented. Further, upper bounds on the value function are derived in order to estimate the degree of sub-optimality of various solutions. The simulation study compares the performance of Whittle's index, modified Whittle's index and myopic policies.



There are no comments yet.


page 1

page 2

page 3

page 4


Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

Sequential decision making under uncertainty is studied in a mixed obser...

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multi...

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in...

Restless Bandits with Many Arms: Beating the Central Limit Theorem

We consider finite-horizon restless bandits with multiple pulls per peri...

A General Framework of Multi-Armed Bandit Processes by Switching Restrictions

This paper proposes a general framework of multi-armed bandit (MAB) proc...

Collapsing Bandits and Their Application to Public Health Interventions

We propose and study Collpasing Bandits, a new restless multi-armed band...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.