Restless Hidden Markov Bandits with Linear Rewards
This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but has to infer it based on its actions and the received reward. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, the regret does not depend on the number of extreme points in the action space.
READ FULL TEXT