Minimax Regret for Cascading Bandits

03/23/2022
by   Daniel Vial, et al.
0

Cascading bandits model the task of learning to rank K out of L items over n rounds of partial feedback. For this model, the minimax (i.e., gap-free) regret is poorly understood; in particular, the best known lower and upper bounds are Ω(√(nL/K)) and Õ(√(nLK)), respectively. We improve the lower bound to Ω(√(nL)) and show CascadeKL-UCB (which ranks items by their KL-UCB indices) attains it up to log terms. Surprisingly, we also show CascadeUCB1 (which ranks via UCB1) can suffer suboptimal Ω(√(nLK)) regret. This sharply contrasts with standard L-armed bandits, where the corresponding algorithms both achieve the minimax regret √(nL) (up to log terms), and the main advantage of KL-UCB is only to improve constants in the gap-dependent bounds. In essence, this contrast occurs because Pinsker's inequality is tight for hard problems in the L-armed case but loose (by a factor of K) in the cascading case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2018

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

In the context of K-armed stochastic bandits with distribution only assu...
research
10/03/2014

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
02/01/2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret...
research
02/10/2015

Cascading Bandits: Learning to Rank in the Cascade Model

A search engine usually outputs a list of K web pages. The user examines...
research
01/09/2023

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...
research
03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...
research
02/26/2023

No-Regret Linear Bandits beyond Realizability

We study linear bandits when the underlying reward function is not linea...

Please sign up or login with your details

Forgot password? Click here to reset