Dueling Bandits: From Two-dueling to Multi-dueling

11/16/2022
by   Yihan Du, et al.
0

We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms. This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options. We start with the two-dueling bandit setting and propose two efficient algorithms, DoublerBAI and MultiSBM-Feedback. DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of O(ln T). MultiSBM-Feedback not only has an optimal O(ln T) regret, but also reduces the constant factor by almost a half compared to benchmark results. Then, we consider the general multi-dueling case and develop an efficient algorithm MultiRUCB. Using a novel finite-time regret analysis for the general multi-dueling bandit problem, we show that MultiRUCB also achieves an O(ln T) regret bound and the bound tightens as the capacity of the comparison set increases. Based on both synthetic and real-world datasets, we empirically demonstrate that our algorithms outperform existing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2016

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...
research
11/25/2019

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

We investigate the adversarial bandit problem with multiple plays under ...
research
09/30/2021

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Although the classical version of the Multi-Armed Bandits (MAB) framewor...
research
07/05/2018

Contextual Bandits under Delayed Feedback

Delayed feedback is an ubiquitous problem in many industrial systems emp...
research
06/09/2020

Stochastic matrix games with bandit feedback

We study a version of the classical zero-sum matrix game with unknown pa...
research
05/12/2023

High Accuracy and Low Regret for User-Cold-Start Using Latent Bandits

We develop a novel latent-bandit algorithm for tackling the cold-start p...
research
10/11/2022

Trading Off Resource Budgets for Improved Regret Bounds

In this work we consider a variant of adversarial online learning where ...

Please sign up or login with your details

Forgot password? Click here to reset