Borda Regret Minimization for Generalized Linear Dueling Bandits

03/15/2023
by   Yue Wu, et al.
4

Dueling bandits are widely used to model preferential feedback that is prevalent in machine learning applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a new and highly expressive generalized linear dueling bandits model, which covers many existing models. Surprisingly, the Borda regret minimization problem turns out to be difficult, as we prove a regret lower bound of order Ω(d^2/3 T^2/3), where d is the dimension of contextual vectors and T is the time horizon. To attain the lower bound, we propose an explore-then-commit type algorithm, which has a nearly matching regret upper bound Õ(d^2/3 T^2/3). When the number of items/arms K is small, our algorithm can achieve a smaller regret Õ( (d log K)^1/3 T^2/3) with proper choices of hyperparameters. We also conduct empirical experiments on both synthetic data and a simulated real-world environment, which corroborate our theoretical analysis.

READ FULL TEXT
research
02/28/2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recomm...
research
04/28/2020

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...
research
06/08/2020

Learning the Truth From Only One Side of the Story

Learning under one-sided feedback (i.e., where examples arrive in an onl...
research
06/06/2020

Contextual Bandits with Side-Observations

We investigate contextual bandits in the presence of side-observations a...
research
04/04/2019

Empirical Bayes Regret Minimization

The prevalent approach to bandit algorithm design is to have a low-regre...
research
07/12/2022

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Online learning to rank (OLTR) interactively learns to choose lists of i...
research
09/15/2022

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

We propose a novel contextual bandit algorithm for generalized linear re...

Please sign up or login with your details

Forgot password? Click here to reset