Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

10/31/2022
by   Abheek Ghosh, et al.
9

We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We discuss why the optimality guarantees fail and why asymptotic optimality may not translate well to practically relevant planning horizons. We then propose an alternate planning algorithm based on the mean-field method, which can provably and efficiently obtain near-optimal policies with a large number of arms, without the stringent structural assumptions required by the Whittle index policies. This borrows ideas from existing research with some improvements: our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has: (a) no requirement for exogenous hyper-parameters and tighter polynomial dependence on known problem parameters; (b) high probability bounds which show that the reward of the policy is reliable; and (c) matching sub-optimality lower bounds for this algorithm with respect to the number of arms, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines.

READ FULL TEXT
research
03/29/2022

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in...
research
10/04/2022

Reproducible Bandits

In this paper, we introduce the notion of reproducible policies in the c...
research
07/25/2021

Restless Bandits with Many Arms: Beating the Central Limit Theorem

We consider finite-horizon restless bandits with multiple pulls per peri...
research
07/20/2016

On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits

The Knowledge Gradient (KG) policy was originally proposed for online ra...
research
09/07/2018

When Hashing Met Matching: Efficient Search for Potential Matches in Ride Sharing

We study the problem of matching rides in a ride sharing platform. Such ...
research
02/03/2023

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

In the stochastic multi-armed bandit problem, a randomized probability m...
research
01/04/2018

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...

Please sign up or login with your details

Forgot password? Click here to reset