Combinatorial Pure Exploration of Dueling Bandit

06/23/2020
by   Wei Chen, et al.
0

In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is an adaptation of the original combinatorial pure exploration for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting. We consider both the Borda winner and the Condorcet winner cases. For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm CAR-Cond with sample complexity analysis. CAR-Cond is the first algorithm with polynomial running time per round for identifying the Condorcet winner in CPE-DB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2019

Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE),...
research
06/14/2020

Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

In this paper, we propose the novel model of combinatorial pure explorat...
research
08/20/2023

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

We study the real-valued combinatorial pure exploration of the multi-arm...
research
06/15/2023

Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

The combinatorial pure exploration (CPE) in the stochastic multi-armed b...
research
05/04/2018

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version)

We study the Combinatorial Pure Exploration problem with Continuous and ...
research
02/09/2019

Pure Exploration with Multiple Correct Answers

We determine the sample complexity of pure exploration bandit problems w...
research
12/08/2021

A Fast Algorithm for PAC Combinatorial Pure Exploration

We consider the problem of Combinatorial Pure Exploration (CPE), which d...

Please sign up or login with your details

Forgot password? Click here to reset