Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

09/10/2017
by   Iñigo Urteaga, et al.
0

Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world works. One general class of algorithms for such learning is the multi-armed bandit setting (in which sequential interactions are independent and identically distributed) and the related contextual bandit case, in which the distribution depends on different information or 'context' presented with each interaction. Thompson sampling, though introduced in the 1930s, has recently been shown to perform well and to enjoy provable optimality properties, while at the same time permitting generative, interpretable modeling. In a Bayesian setting, prior knowledge is incorporated and the computed posteriors naturally capture the full state of knowledge. In several application domains, for example in health and medicine, each interaction with the world can be expensive and invasive, whereas drawing samples from the model is relatively inexpensive. Exploiting this viewpoint, we develop a double-sampling technique driven by the uncertainty in the learning process. The proposed algorithm does not make any distributional assumption and it is applicable to complex reward distributions, as long as Bayesian posterior updates are computable. We empirically show that it out-performs (in the sense of regret) Thompson sampling in two classical illustrative cases, i.e., the multi-armed bandit problem with and without context.

READ FULL TEXT

page 10

page 12

research
09/10/2017

Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...
research
08/08/2018

Nonparametric Gaussian mixture models for the multi-armed contextual bandit

The multi-armed bandit is a sequential allocation task where an agent mu...
research
10/29/2021

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...
research
04/26/2021

To mock a Mocking bird : Studies in Biomimicry

This paper dwells on certain novel game-theoretic investigations in bio-...
research
06/06/2020

Visual Prediction of Priors for Articulated Object Interaction

Exploration in novel settings can be challenging without prior experienc...
research
07/01/2015

Bootstrapped Thompson Sampling and Deep Exploration

This technical note presents a new approach to carrying out the kind of ...
research
07/08/2019

Thompson Sampling on Symmetric α-Stable Bandits

Thompson Sampling provides an efficient technique to introduce prior kno...

Please sign up or login with your details

Forgot password? Click here to reset