Multi-Armed Bandits with Self-Information Rewards

09/06/2022
by   Nir Weinberger, et al.
0

This paper introduces the informational multi-armed bandit (IMAB) model in which at each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. Thus, the expected reward of an arm is the Shannon entropy of the probability mass function of the source that generates its symbols. The player aims to maximize the expected total reward associated with the entropy values of the arms played. Under the assumption that the alphabet size is known, two UCB-based algorithms are proposed for the IMAB model which consider the biases of the plug-in entropy estimator. The first algorithm optimistically corrects the bias term in the entropy estimation. The second algorithm relies on data-dependent confidence intervals that adapt to sources with small entropy values. Performance guarantees are provided by upper bounding the expected regret of each of the algorithms. Furthermore, in the Bernoulli case, the asymptotic behavior of these algorithms is compared to the Lai-Robbins lower bound for the pseudo regret. Additionally, under the assumption that the exact alphabet size is unknown, and instead the player only knows a loose upper bound on it, a UCB-based algorithm is proposed, in which the player aims to reduce the regret caused by the unknown alphabet size in a finite time regime. Numerical results illustrating the expected regret of the algorithms presented in the paper are provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Restless Multi-Armed Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
06/12/2023

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where...
research
11/05/2018

Multi-armed Bandits with Compensation

We propose and study the known-compensation multi-arm bandit (KCMAB) pro...
research
03/24/2021

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

We propose a novel algorithm for multi-player multi-armed bandits withou...
research
12/12/2018

On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

We study the multi-player stochastic multiarmed bandit (MAB) problem in ...
research
11/17/2016

Unimodal Thompson Sampling for Graph-Structured Arms

We study, to the best of our knowledge, the first Bayesian algorithm for...
research
12/13/2021

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...

Please sign up or login with your details

Forgot password? Click here to reset