Double Thompson Sampling for Dueling Bandits

04/25/2016
by   Huasen Wu, et al.
0

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As indicated by its name, D-TS selects both the first and the second candidates according to Thompson Sampling. Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison by sampling twice from the posterior distribution. This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as its special case. For general Copeland dueling bandits, we show that D-TS achieves O(K^2 T) regret. For Condorcet dueling bandits, we further simplify the D-TS algorithm and show that the simplified D-TS algorithm achieves O(K T + K^2 T) regret. Simulation results based on both synthetic and real-world data demonstrate the efficiency of the proposed D-TS algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2022

Langevin Monte Carlo for Contextual Bandits

We study the efficiency of Thompson sampling for contextual bandits. Exi...
research
06/15/2021

Thompson Sampling for Unimodal Bandits

In this paper, we propose a Thompson Sampling algorithm for unimodal ban...
research
05/18/2018

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

We address the problem of regret minimization in logistic contextual ban...
research
07/09/2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an in...
research
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...
research
02/02/2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based contro...
research
10/15/2020

Double-Linear Thompson Sampling for Context-Attentive Bandits

In this paper, we analyze and extend an online learning framework known ...

Please sign up or login with your details

Forgot password? Click here to reset