Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

06/26/2023
by   Haruka Kiyohara, et al.
0

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.

READ FULL TEXT

page 7

page 12

page 13

page 14

page 15

research
02/03/2022

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

In real-world recommender systems and search engines, optimizing ranking...
research
02/13/2022

Off-Policy Evaluation for Large Action Spaces via Embeddings

Off-policy evaluation (OPE) in contextual bandits has seen rapid adoptio...
research
11/29/2020

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Off-policy evaluation is a key component of reinforcement learning which...
research
05/16/2016

Off-policy evaluation for slate recommendation

This paper studies the evaluation of policies that recommend an ordered ...
research
11/25/2022

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance ...
research
05/25/2020

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Unbiased CLTR requires click propensities to compensate for the differen...
research
04/26/2023

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Counterfactual learning to rank (CLTR) relies on exposure-based inverse ...

Please sign up or login with your details

Forgot password? Click here to reset