DeepAI
Log In Sign Up

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

06/08/2015
by   Junpei Komiyama, et al.
0

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/05/2016

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...
09/30/2015

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Partial monitoring is a general model for sequential learning with limit...
11/10/2020

Efficient Algorithms for Stochastic Repeated Second-price Auctions

Developing efficient sequential bidding strategies for repeated auctions...
03/30/2021

Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonco...
05/29/2021

Understanding Bandits with Graph Feedback

The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...
12/11/2018

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information ret...

Code Repositories