Simple Algorithms for Dueling Bandits

06/18/2019
by   Tyler Lekang, et al.
0

In this paper, we present simple algorithms for Dueling Bandits. We prove that the algorithms have regret bounds for time horizon T of order O(T^rho ) with 1/2 <= rho <= 3/4, which importantly do not depend on any preference gap between actions, Delta. Dueling Bandits is an important extension of the Multi-Armed Bandit problem, in which the algorithm must select two actions at a time and only receives binary feedback for the duel outcome. This is analogous to comparisons in which the rater can only provide yes/no or better/worse type responses. We compare our simple algorithms to the current state-of-the-art for Dueling Bandits, ISS and DTS, discussing complexity and regret upper bounds, and conducting experiments on synthetic data that demonstrate their regret performance, which in some cases exceeds state-of-the-art.

READ FULL TEXT
research
07/04/2018

Factored Bandits

We introduce the factored bandits model, which is a framework for learni...
research
12/24/2020

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our ...
research
08/15/2021

Batched Thompson Sampling for Multi-Armed Bandits

We study Thompson Sampling algorithms for stochastic multi-armed bandits...
research
12/10/2018

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of ...
research
03/01/2023

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

Conversational contextual bandits elicit user preferences by occasionall...
research
03/09/2019

Linear Bandits with Feature Feedback

This paper explores a new form of the linear bandit problem in which the...
research
06/08/2022

Uplifting Bandits

We introduce a multi-armed bandit model where the reward is a sum of mul...

Please sign up or login with your details

Forgot password? Click here to reset