Optimizing Ranking Systems Online as Bandits

10/12/2021
by   Chang Li, et al.
0

Ranking system is the core part of modern retrieval and recommender systems, where the goal is to rank candidate items given user contexts. Optimizing ranking systems online means that the deployed system can serve user requests, e.g., queries in the web search, and optimize the ranking policy by learning from user interactions, e.g., clicks. Bandit is a general online learning framework and can be used in our optimization task. However, due to the unique features of ranking, there are several challenges in designing bandit algorithms for ranking system optimization. In this dissertation, we study and propose solutions for four challenges in optimizing ranking systems online: effectiveness, safety, nonstationarity, and diversification. First, the effectiveness is related to how fast the algorithm learns from interactions. We study the effective online ranker evaluation task and propose the MergeDTS algorithm to solve the problem effectively. Second, the deployed algorithm should be safe, which means the algorithm only displays reasonable content to user requests. To solve the safe online learning to rank problem, we propose the BubbleRank algorithm. Third, as users change their preferences constantly, the algorithm should handle the nonstationarity. We formulate this nonstationary online learning to rank problem as cascade non-stationary bandits and propose CascadeDUCB and CascadeSWUCB algorithms to solve the problem. Finally, the contents in ranked lists should be diverse. We consider the results diversification task and propose the CascadeHybird algorithm that considers both the item relevance and results diversification when learning from user interactions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

Non-stationarity appears in many online applications such as web search ...
research
05/02/2023

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Bandit algorithms for online learning to rank (OLTR) problems often aim ...
research
12/01/2020

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...
research
08/22/2020

Fatigue-aware Bandits for Dependent Click Models

As recommender systems send a massive amount of content to keep users en...
research
12/09/2022

Multi-Task Off-Policy Learning from Bandit Feedback

Many practical applications, such as recommender systems and learning to...
research
01/29/2019

Optimizing Ranking Models in an Online Setting

Online Learning to Rank (OLTR) methods optimize ranking models by direct...
research
05/18/2012

Online Structured Prediction via Coactive Learning

We propose Coactive Learning as a model of interaction between a learnin...

Please sign up or login with your details

Forgot password? Click here to reset