Learning Neural Ranking Models Online from Implicit User Feedback

01/17/2022
by   Yiling Jia, et al.
0

Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents. In this work, to unleash the power of representation learning in OL2R, we propose to directly learn a neural ranking model from users' implicit feedback (e.g., clicks) collected on the fly. We focus on RankNet and LambdaRank, due to their great empirical success and wide adoption in offline settings, and control the notorious explore-exploit trade-off based on the convergence analysis of neural networks using neural tangent kernel. Specifically, in each round of result serving, exploration is only performed on document pairs where the predicted rank order between the two documents is uncertain; otherwise, the ranker's predicted order will be followed in result ranking. We prove that under standard assumptions our OL2R solution achieves a gap-dependent upper regret bound of O(log^2(T)), in which the regret is defined on the total number of mis-ordered pairs over T rounds. Comparisons against an extensive set of state-of-the-art OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.

READ FULL TEXT
research
02/28/2021

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance...
research
11/01/2021

Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank

Online learning to rank (OL2R) has attracted great research interests in...
research
06/13/2022

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

Deep neural networks (DNNs) demonstrate significant advantages in improv...
research
06/10/2019

Variance Reduction in Gradient Exploration for Online Learning to Rank

Online Learning to Rank (OL2R) algorithms learn from implicit user feedb...
research
09/22/2018

Differentiable Unbiased Online Learning to Rank

Online Learning to Rank (OLTR) methods optimize rankers based on user in...
research
06/08/2020

Learning the Truth From Only One Side of the Story

Learning under one-sided feedback (i.e., where examples arrive in an onl...
research
08/22/2016

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

New ranking algorithms are continually being developed and refined, nece...

Please sign up or login with your details

Forgot password? Click here to reset