DeepAI AI Chat
Log In Sign Up

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

by   Iñigo Urteaga, et al.
Columbia University

Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world works. One general class of algorithms for such learning is the multi-armed bandit setting (in which sequential interactions are independent and identically distributed) and the related contextual bandit case, in which the distribution depends on different information or 'context' presented with each interaction. Thompson sampling, though introduced in the 1930s, has recently been shown to perform well and to enjoy provable optimality properties, while at the same time permitting generative, interpretable modeling. In a Bayesian setting, prior knowledge is incorporated and the computed posteriors naturally capture the full state of knowledge. In several application domains, for example in health and medicine, each interaction with the world can be expensive and invasive, whereas drawing samples from the model is relatively inexpensive. Exploiting this viewpoint, we develop a double-sampling technique driven by the uncertainty in the learning process. The proposed algorithm does not make any distributional assumption and it is applicable to complex reward distributions, as long as Bayesian posterior updates are computable. We empirically show that it out-performs (in the sense of regret) Thompson sampling in two classical illustrative cases, i.e., the multi-armed bandit problem with and without context.


page 10

page 12


Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...

Nonparametric Gaussian mixture models for the multi-armed contextual bandit

The multi-armed bandit is a sequential allocation task where an agent mu...

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...

To mock a Mocking bird : Studies in Biomimicry

This paper dwells on certain novel game-theoretic investigations in bio-...

Bootstrapped Thompson Sampling and Deep Exploration

This technical note presents a new approach to carrying out the kind of ...

Visual Prediction of Priors for Articulated Object Interaction

Exploration in novel settings can be challenging without prior experienc...

Thompson Sampling on Symmetric α-Stable Bandits

Thompson Sampling provides an efficient technique to introduce prior kno...