Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

04/26/2023
by   Shashank Gupta, et al.
0

Counterfactual learning to rank (CLTR) relies on exposure-based inverse propensity scoring (IPS), a LTR-specific adaptation of IPS to correct for position bias. While IPS can provide unbiased and consistent estimates, it often suffers from high variance. Especially when little click data is available, this variance can cause CLTR to learn sub-optimal ranking behavior. Consequently, existing CLTR methods bring significant risks with them, as naively deploying their models can result in very negative user experiences. We introduce a novel risk-aware CLTR method with theoretical guarantees for safe deployment. We apply a novel exposure-based concept of risk regularization to IPS estimation for LTR. Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model. Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in IPS estimation, which greatly reduces the risks during deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available, while also maintaining high performance at convergence. For the CLTR field, our novel exposure-based risk minimization method enables practitioners to adopt CLTR methods in a safer manner that mitigates many of the risks attached to previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Existing work in counterfactual Learning to Rank (LTR) has focussed on o...
research
07/15/2019

To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Learning to Rank (LTR) from user interactions is challenging as user fee...
research
07/24/2020

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differen...
research
05/21/2020

Accelerated Convergence for Counterfactual Learning to Rank

Counterfactual Learning to Rank (LTR) algorithms learn a ranking model f...
research
12/08/2020

Unifying Online and Counterfactual Learning to Rank

Optimizing ranking systems based on user interactions is a well-studied ...
research
08/24/2020

When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Besides position bias, which has been well-studied, trust bias is anothe...
research
06/26/2023

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Ranking interfaces are everywhere in online platforms. There is thus an ...

Please sign up or login with your details

Forgot password? Click here to reset