Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

by   Aditya Mate, et al.

Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account for the arrival of new patients and the departure of enrolled patients from a treatment program. To address this challenge, we formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs where heterogeneous arms arrive and leave under possibly random streams. We propose a new and scalable approach to computing index-based solutions. We start by proving that index values decrease for short residual lifetimes, a phenomenon that we call index decay. We then provide algorithms designed to capture index decay without having to solve the costly finite horizon problem, thereby lowering the computational complexity compared to existing methods.We evaluate our approach via simulations run on real-world data obtained from a tuberculosis intervention planning task as well as multiple other synthetic domains. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance. These findings are robust across multiple domains.


page 7

page 11


Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I analyse the frequentist regret of the famous Gittins index strategy fo...

An Index-based Deterministic Asymptotically Optimal Algorithm for Constrained Multi-armed Bandit Problems

For the model of constrained multi-armed bandit, we show that by constru...

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

We consider nonstationary multi-armed bandit problems where the model pa...

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multi...

Networked Restless Multi-Armed Bandits for Mobile Interventions

Motivated by a broad class of mobile intervention problems, we propose a...

On Bayesian index policies for sequential resource allocation

This paper is about index policies for minimizing (frequentist) regret i...

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

In many public health settings, it is important for patients to adhere t...