Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

by   Yiling Jia, et al.

Deep neural networks (DNNs) demonstrate significant advantages in improving ranking performance in retrieval tasks. Driven by the recent technical developments in optimization and generalization of DNNs, learning a neural ranking model online from its interactions with users becomes possible. However, the required exploration for model learning has to be performed in the entire neural network parameter space, which is prohibitively expensive and limits the application of such online solutions in practice. In this work, we propose an efficient exploration strategy for online interactive neural ranker learning based on the idea of bootstrapping. Our solution employs an ensemble of ranking models trained with perturbed user click feedback. The proposed method eliminates explicit confidence set construction and the associated computational overhead, which enables the online neural rankers' training to be efficiently executed in practice with theoretical guarantees. Extensive comparisons with an array of state-of-the-art OL2R algorithms on two public learning to rank benchmark datasets demonstrate the effectiveness and computational efficiency of our proposed neural OL2R solution.


page 1

page 2

page 3

page 4


PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance...

Learning Neural Ranking Models Online from Implicit User Feedback

Existing online learning to rank (OL2R) solutions are limited to linear ...

Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank

Online learning to rank (OL2R) has attracted great research interests in...

Efficient Exploration of Gradient Space for Online Learning to Rank

Online learning to rank (OL2R) optimizes the utility of returned search ...

Explore-Exploit: A Framework for Interactive and Online Learning

Interactive user interfaces need to continuously evolve based on the int...

Learning Contextual Bandits Through Perturbed Rewards

Thanks to the power of representation learning, neural contextual bandit...

Variance Reduction in Gradient Exploration for Online Learning to Rank

Online Learning to Rank (OL2R) algorithms learn from implicit user feedb...