Thompson Sampling for Noncompliant Bandits

12/03/2018
by   Andrew Stirn, et al.
0

Thompson sampling, a Bayesian method for balancing exploration and exploitation in bandit problems, has theoretical guarantees and exhibits strong empirical performance in many domains. Traditional Thompson sampling, however, assumes perfect compliance, where an agent's chosen action is treated as the implemented action. This article introduces a stochastic noncompliance model that relaxes this assumption. We prove that any noncompliance in a 2-armed Bernoulli bandit increases existing regret bounds. With our noncompliance model, we derive Thompson sampling variants that explicitly handle both observed and latent noncompliance. With extensive empirical analysis, we demonstrate that our algorithms either match or outperform traditional Thompson sampling in both compliant and noncompliant environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2022

Nonstationary Bandit Learning via Predictive Sampling

We propose predictive sampling as an approach to selecting actions that ...
research
01/08/2020

On Thompson Sampling for Smoother-than-Lipschitz Bandits

Thompson Sampling is a well established approach to bandit and reinforce...
research
05/29/2019

Regret Bounds for Thompson Sampling in Restless Bandit Problems

Restless bandit problems are instances of non-stationary multi-armed ban...
research
10/02/2018

Thompson Sampling for Cascading Bandits

We design and analyze TS-Cascade, a Thompson sampling algorithm for the ...
research
06/26/2023

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

This paper is motivated by recent developments in the linear bandit lite...
research
12/10/2018

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of ...
research
06/11/2020

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Thompson sampling has become a ubiquitous approach to online decision pr...

Please sign up or login with your details

Forgot password? Click here to reset