(Sequential) Importance Sampling Bandits

08/08/2018
by   Iñigo Urteaga, et al.
6

The multi-armed bandit (MAB) problem is a sequential allocation task where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed; i.e., sequential optimal decisions are made, while simultaneously learning how the world operates. In the stochastic setting, the reward for each action is generated from an unknown distribution. To decide the next optimal action to take, one must compute sufficient statistics of this unknown reward distribution, e.g. upper-confidence bounds (UCB), or expectations in Thompson sampling. Closed-form expressions for these statistics of interest are analytically intractable except for simple cases. We here propose to leverage Monte Carlo estimation and, in particular, the flexibility of (sequential) importance sampling (IS) to allow for accurate estimation of the statistics of interest within the MAB problem. IS methods estimate posterior densities or expectations in probabilistic models that are analytically intractable. We first show how IS can be combined with state-of-the-art MAB algorithms (Thompson sampling and Bayes-UCB) for classic (Bernoulli and contextual linear-Gaussian) bandit problems. Furthermore, we leverage the power of sequential IS to extend the applicability of these algorithms beyond the classic settings, and tackle additional useful cases. Specifically, we study the dynamic linear-Gaussian bandit, and both the static and dynamic logistic cases too. The flexibility of (sequential) importance sampling is shown to be fundamental for obtaining efficient estimates of the key sufficient statistics in these challenging scenarios.

READ FULL TEXT

page 25

page 26

page 27

page 28

page 29

page 30

page 31

page 33

research
08/08/2018

Nonparametric Gaussian mixture models for the multi-armed contextual bandit

The multi-armed bandit is a sequential allocation task where an agent mu...
research
02/23/2018

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, w...
research
01/21/2021

An empirical evaluation of active inference in multi-armed bandits

A key feature of sequential decision making under uncertainty is a need ...
research
09/10/2017

Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...
research
07/31/2021

Debiasing Samples from Online Learning Using Bootstrap

It has been recently shown in the literature that the sample averages fr...
research
06/09/2019

Balanced Off-Policy Evaluation General Action Spaces

In many practical applications of contextual bandits, online learning is...
research
09/20/2023

On the convergence conditions of Laplace importance sampling with randomized quasi-Monte Carlo

The study further explores randomized QMC (RQMC), which maintains the QM...

Please sign up or login with your details

Forgot password? Click here to reset