Whittle index based Q-learning for restless bandits with average reward

04/29/2020
by   Konstantin Avrachenkov, et al.
18

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally propose...
research
10/04/2019

Discounted Reinforcement Learning is Not an Optimization Problem

Discounted reinforcement learning is fundamentally incompatible with fun...
research
02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...
research
10/05/2021

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Whittle index policy is a powerful tool to obtain asymptotically optimal...
research
04/07/2023

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

We extend the provably convergent Full Gradient DQN algorithm for discou...
research
12/09/2005

Evolving Stochastic Learning Algorithm Based on Tsallis Entropic Index

In this paper, inspired from our previous algorithm, which was based on ...
research
07/06/2023

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

In this paper, we consider a general observation model for restless mult...

Please sign up or login with your details

Forgot password? Click here to reset