BubbleRank: Safe Online Learning to Rerank

06/15/2018
by   Branislav Kveton, et al.
2

We study the problem of online learning to re-rank, where users provide feedback to improve the quality of displayed lists. Learning to rank has been traditionally studied in two settings. In the offline setting, rankers are typically learned from relevance labels of judges. These approaches have become the industry standard. However, they lack exploration, and thus are limited by the information content of offline data. In the online setting, an algorithm can propose a list and learn from the feedback on it in a sequential fashion. Bandit algorithms developed for this setting actively experiment, and in this way overcome the biases of offline data. But they also tend to ignore offline data, which results in a high initial cost of exploration. We propose BubbleRank, a bandit algorithm for re-ranking that combines the strengths of both settings. The algorithm starts with an initial base list and improves it gradually by swapping higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive numerical experiments on a large real-world click dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Bandit algorithms for online learning to rank (OLTR) problems often aim ...
research
11/01/2018

Online Diverse Learning to Rank from Partial-Click Feedback

Learning to rank is an important problem in machine learning and recomme...
research
12/12/2018

Online Learning to Rank with List-level Feedback for Image Filtering

Online learning to rank (OLTR) via implicit feedback has been extensivel...
research
11/13/2018

Community Exploration: From Offline Optimization to Online Learning

We introduce the community exploration problem that has many real-world ...
research
06/09/2023

RankFormer: Listwise Learning-to-Rank Using Listwide Labels

Web applications where users are presented with a limited selection of i...
research
10/11/2018

Offline Comparison of Ranking Functions using Randomized Data

Ranking functions return ranked lists of items, and users often interact...
research
06/06/2022

Offline Evaluation of Ranked Lists using Parametric Estimation of Propensities

Search engines and recommendation systems attempt to continually improve...

Please sign up or login with your details

Forgot password? Click here to reset