Exploration of Unranked Items in Safe Online Learning to Re-Rank

05/02/2023
by   Hiroaki Shiino, et al.
0

Bandit algorithms for online learning to rank (OLTR) problems often aim to maximize long-term revenue by utilizing user feedback. From a practical point of view, however, such algorithms have a high risk of hurting user experience due to their aggressive exploration. Thus, there has been a rising demand for safe exploration in recent years. One approach to safe exploration is to gradually enhance the quality of an original ranking that is already guaranteed acceptable quality. In this paper, we propose a safe OLTR algorithm that efficiently exchanges one of the items in the current ranking with an item outside the ranking (i.e., an unranked item) to perform exploration. We select an unranked item optimistically to explore based on Kullback-Leibler upper confidence bounds (KL-UCB) and safely re-rank the items including the selected one. Through experiments, we demonstrate that the proposed algorithm improves long-term regret from baselines without any safety violation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2018

BubbleRank: Safe Online Learning to Rerank

We study the problem of online learning to re-rank, where users provide ...
research
10/12/2021

Optimizing Ranking Systems Online as Bandits

Ranking system is the core part of modern retrieval and recommender syst...
research
10/05/2018

Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probabili...
research
10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...
research
05/18/2018

Efficient Exploration of Gradient Space for Online Learning to Rank

Online learning to rank (OL2R) optimizes the utility of returned search ...
research
12/01/2018

Explore-Exploit: A Framework for Interactive and Online Learning

Interactive user interfaces need to continuously evolve based on the int...
research
02/28/2021

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance...

Please sign up or login with your details

Forgot password? Click here to reset