On Learning to Rank Long Sequences with Contextual Bandits

06/07/2021
by   Anirban Santara, et al.
0

Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2020

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

In this paper, we consider online learning in generalized linear context...
research
05/24/2018

New Insights into Bootstrapping for Bandits

We investigate the use of bootstrapping in the bandit setting. We first ...
research
03/23/2020

Algorithms for Non-Stationary Generalized Linear Bandits

The statistical framework of Generalized Linear Models (GLM) can be appl...
research
06/16/2021

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

We study learning algorithms for the classical Markovian bandit problem ...
research
03/12/2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

We study the adversarial online learning problem and create a completely...
research
05/04/2016

Linear Bandit algorithms using the Bootstrap

This study presents two new algorithms for solving linear stochastic ban...
research
04/26/2016

Distributed Clustering of Linear Bandits in Peer to Peer Networks

We provide two distributed confidence ball algorithms for solving linear...

Please sign up or login with your details

Forgot password? Click here to reset