Information-Gathering in Latent Bandits

07/08/2022
by   Alexander Galozy, et al.
0

In the latent bandit problem, the learner has access to reward distributions and – for the non-stationary variant – transition models of the environment. The reward distributions are conditioned on the arm and unknown latent states. The goal is to use the reward history to identify the latent state, allowing for the optimal choice of arms in the future. The latent bandit setting lends itself to many practical applications, such as recommender and decision support systems, where rich data allows the offline estimation of environment models with online learning remaining a critical component. Previous solutions in this setting always choose the highest reward arm according to the agent's beliefs about the state, not explicitly considering the value of information-gathering arms. Such information-gathering arms do not necessarily provide the highest reward, thus may never be chosen by an agent that chooses the highest reward arms at all times. In this paper, we present a method for information-gathering in latent bandits. Given particular reward structures and transition matrices, we show that choosing the best arm given the agent's beliefs about the states incurs higher regret. Furthermore, we show that by choosing arms carefully, we obtain an improved estimation of the state distribution, and thus lower the cumulative regret through better arm choices in the future. We evaluate our method on both synthetic and real-world data sets, showing significant improvement in regret over state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...
research
06/01/2016

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider ...
research
05/04/2018

BelMan: Bayesian Bandits on the Belief--Reward Manifold

We propose a generic, Bayesian, information geometric approach to the ex...
research
06/19/2019

Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules

We consider a class of restless multi-armed bandit (RMAB) problems with ...
research
12/07/2022

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e...
research
09/12/2012

Regret Bounds for Restless Markov Bandits

We consider the restless Markov bandit problem, in which the state of ea...
research
03/22/2021

Combining Reward Information from Multiple Sources

Given two sources of evidence about a latent variable, one can combine t...

Please sign up or login with your details

Forgot password? Click here to reset