Restless Hidden Markov Bandits with Linear Rewards

10/22/2019
by   Michal Yemini, et al.
0

This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but has to infer it based on its actions and the received reward. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, the regret does not depend on the number of extreme points in the action space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2021

Learning in Restless Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
09/07/2021

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

We introduce a framework for decentralized online learning for multi-arm...
research
06/26/2019

Orthogonal Projection in Linear Bandits

The expected reward in a linear stochastic bandit model is an unknown li...
research
11/16/2022

Dynamical Linear Bandits

In many real-world sequential decision-making problems, an action does n...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...
research
10/18/2022

Contextual bandits with concave rewards, and an application to fair ranking

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-obje...
research
10/22/2021

Break your Bandit Routine with LSD Rewards: a Last Switch Dependent Analysis of Satiation and Seasonality

Motivated by the fact that humans like some level of unpredictability or...

Please sign up or login with your details

Forgot password? Click here to reset