BelMan: Bayesian Bandits on the Belief--Reward Manifold

05/04/2018
by   Debabrota Basu, et al.
0

We propose a generic, Bayesian, information geometric approach to the exploration--exploitation trade-off in multi-armed bandit problems. Our approach, BelMan, uniformly supports pure exploration, exploration--exploitation, and two-phase bandit problems. The knowledge on bandit arms and their reward distributions is summarised by the barycentre of the joint distributions of beliefs and rewards of the arms, the pseudobelief-reward, within the beliefs-rewards manifold. BelMan alternates information projection and reverse information projection, i.e., projection of the pseudobelief-reward onto beliefs-rewards to choose the arm to play, and projection of the resulting beliefs-rewards onto the pseudobelief-reward. It introduces a mechanism that infuses an exploitative bias by means of a focal distribution, i.e., a reward distribution that gradually concentrates on higher rewards. Comparative performance evaluation with state-of-the-art algorithms shows that BelMan is not only competitive but can also outperform other approaches in specific setups, for instance involving many arms and continuous rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2014

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each ...
research
10/13/2020

Multi-Armed Bandits with Dependent Arms

We study a variant of the classical multi-armed bandit problem (MABP) wh...
research
01/12/2016

Infomax strategies for an optimal balance between exploration and exploitation

Proper balance between exploitation and exploration is what makes good d...
research
07/08/2022

Information-Gathering in Latent Bandits

In the latent bandit problem, the learner has access to reward distribut...
research
05/26/2020

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in va...
research
07/22/2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

The exploration/exploitation (E/E) dilemma arises naturally in many subf...
research
03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...

Please sign up or login with your details

Forgot password? Click here to reset