Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

05/31/2023
by   Yige Hong, et al.
0

We study the infinite-horizon restless bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental question is how to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, N, grows large. Existing results on asymptotical optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework that converts any single-armed policy into a policy for the original N-armed problem. This is accomplished by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an O(1/√(N)) optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that do not satisfy UGAP. More notably, in the continuous-time setting, our result does not require any additional assumptions beyond the standard unichain condition. In both settings, we establish the first asymptotic optimality result that does not require UGAP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2018

A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions

This paper proposes a general framework of multi-armed bandit (MAB) proc...
research
08/20/2018

A General Framework of Multi-Armed Bandit Processes by Switching Restrictions

This paper proposes a general framework of multi-armed bandit (MAB) proc...
research
03/29/2022

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in...
research
11/01/2022

Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems

We study the global linear convergence of policy gradient (PG) methods f...
research
08/19/2022

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

We study the Improving Multi-Armed Bandit (IMAB) problem, where the rewa...
research
07/25/2021

Restless Bandits with Many Arms: Beating the Central Limit Theorem

We consider finite-horizon restless bandits with multiple pulls per peri...

Please sign up or login with your details

Forgot password? Click here to reset