Policy-Aware Unbiased Learning to Rank for Top-k Rankings

05/18/2020
by   Harrie Oosterhuis, et al.
0

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differen...
research
04/22/2022

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

Conventional methods for query autocompletion aim to predict which compl...
research
04/30/2018

Counterfactual Learning-to-Rank for Additive Metrics and Deep Models

Implicit feedback (e.g., clicks, dwell times) is an attractive source of...
research
04/22/2022

Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity

Plackett-Luce gradient estimation enables the optimization of stochastic...
research
03/31/2022

Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback

Clicks on rankings suffer from position bias: generally items on lower r...
research
12/09/2020

Learning from User Interactions with Rankings: A Unification of the Field

Ranking systems form the basis for online search engines and recommendat...
research
06/24/2022

Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank

Click-based learning to rank (LTR) tackles the mismatch between click fr...

Please sign up or login with your details

Forgot password? Click here to reset