Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback
Clicks on rankings suffer from position bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based Learning-to-Rank (LTR) is based on counterfactual Inverse-Propensity-Scoring (IPS) estimation. Unique about LTR is the fact that standard Doubly-Robust (DR) estimation - which combines IPS with regression predictions - is inapplicable since the treatment variable - indicating whether a user examined an item - cannot be observed in the data. In this paper, we introduce a novel DR estimator that uses the expectation of treatment per rank instead. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.
READ FULL TEXT