Log In Sign Up

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

by   Junpei Komiyama, et al.

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.


page 1

page 2

page 3

page 4


Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Partial monitoring is a general model for sequential learning with limit...

Efficient Algorithms for Stochastic Repeated Second-price Auctions

Developing efficient sequential bidding strategies for repeated auctions...

Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonco...

Understanding Bandits with Graph Feedback

The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information ret...

Code Repositories