Unbiased Learning to Rank: Counterfactual and Online Approaches
This tutorial covers and contrasts the two main methodologies in unbiased Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been an interest in LTR from user interactions, however, this form of implicit feedback is very biased. In recent years, unbiased LTR methods have been introduced to remove the effect of different types of bias caused by user-behavior in search. For instance, a well addressed type of bias is position bias: the rank at which a document is displayed heavily affects the interactions it receives. Counterfactual LTR methods deal with such types of bias by learning from historical interactions while correcting for the effect of the explicitly modelled biases. Online LTR does not use an explicit user model, in contrast, it learns through an interactive process where randomized results are displayed to the user. Through randomization the effect of different types of bias can be removed from the learning process. Though both methodologies lead to unbiased LTR, their approaches differ considerably, furthermore, so do their theoretical guarantees, empirical results, effects on the user experience during learning, and applicability. Consequently, for practitioners the choice between the two is very substantial. By providing an overview of both approaches and contrasting them, we aim to provide an essential guide to unbiased LTR so as to aid in understanding and choosing between methodologies.
READ FULL TEXT