Restless Multi-Armed Bandits under Exogenous Global Markov Process

02/28/2022
by   Tomer Gafni, et al.
0

We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule, which is non-identical among different arms. At each time, a player chooses an arm out of N arms to play, and receives a random reward from a finite set of reward states. The arms are restless, that is, their local state evolves regardless of the player's actions. The objective is an arm-selection policy that minimizes the regret, defined as the reward loss with respect to a player that knows the dynamics of the problem, and plays at each time t the arm that maximizes the expected immediate value. We develop the Learning under Exogenous Markov Process (LEMP) algorithm, that achieves a logarithmic regret order with time, and a finite-sample bound on the regret is established. Simulation results support the theoretical study and demonstrate strong performances of LEMP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2021

Learning in Restless Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
06/19/2019

Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules

We consider a class of restless multi-armed bandit (RMAB) problems with ...
research
09/06/2022

Multi-Armed Bandits with Self-Information Rewards

This paper introduces the informational multi-armed bandit (IMAB) model ...
research
11/13/2020

Rebounding Bandits for Modeling Satiation Effects

Psychological research shows that enjoyment of many goods is subject to ...
research
09/20/2021

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multi...
research
05/16/2020

Learning and Optimization with Seasonal Patterns

Seasonality is a common form of non-stationary patterns in the business ...
research
09/12/2012

Regret Bounds for Restless Markov Bandits

We consider the restless Markov bandit problem, in which the state of ea...

Please sign up or login with your details

Forgot password? Click here to reset