Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning

by   Jackson A. Killian, et al.

We introduce Robust Restless Bandits, a challenging generalization of restless multi-arm bandits (RMAB). RMABs have been widely studied for intervention planning with limited resources. However, most works make the unrealistic assumption that the transition dynamics are known perfectly, restricting the applicability of existing methods to real-world scenarios. To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs. RMABPPO hinges on learning an auxiliary "λ-network" that allows each arm's learning to decouple, greatly reducing sample complexity required for training; (iv) Under minimax regret, the adversary in the double oracle approach is notoriously difficult to implement due to non-stationarity. To address this, we formulate the adversary oracle as a multi-agent reinforcement learning problem and solve it with a multi-agent extension of RMABPPO, which may be of independent interest as the first known algorithm for this setting. Code is available at


Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorith...

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow...

A Review of Cooperative Multi-Agent Deep Reinforcement Learning

Deep Reinforcement Learning has made significant progress in multi-agent...

Lenient Multi-Agent Deep Reinforcement Learning

A significant amount of research in recent years has been dedicated towa...

Ensemble and Auxiliary Tasks for Data-Efficient Deep Reinforcement Learning

Ensemble and auxiliary tasks are both well known to improve the performa...

A Robust and Constrained Multi-Agent Reinforcement Learning Framework for Electric Vehicle AMoD Systems

Electric vehicles (EVs) play critical roles in autonomous mobility-on-de...

Robust Optimization for Tree-Structured Stochastic Network Design

Stochastic network design is a general framework for optimizing network ...