Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

09/17/2021
by   Yuta Saito, et al.
0

Off-policy evaluation (OPE) is the method that attempts to estimate the performance of decision making policies using historical data generated by different policies without conducting costly online A/B tests. Accurate OPE is essential in domains such as healthcare, marketing or recommender systems to avoid deploying poor performing policies, as such policies may hart human lives or destroy the user experience. Thus, many OPE methods with theoretical backgrounds have been proposed. One emerging challenge with this trend is that a suitable estimator can be different for each application setting. It is often unknown for practitioners which estimator to use for their specific applications and purposes. To find out a suitable estimator among many candidates, we use a data-driven estimator selection procedure for off-policy policy performance estimators as a practical solution. As proof of concept, we use our procedure to select the best estimator to evaluate coupon treatment policies on a real-world online content delivery service. In the experiment, we first observe that a suitable estimator might change with different definitions of the outcome variable, and thus the accurate estimator selection is critical in real-world applications of OPE. Then, we demonstrate that, by utilizing the estimator selection procedure, we can easily find out suitable estimators for each purpose.

READ FULL TEXT

page 1

page 2

research
08/31/2021

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates...
research
11/25/2022

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance ...
research
08/17/2020

A Large-scale Open Dataset for Bandit Algorithms

We build and publicize the Open Bandit Dataset and Pipeline to facilitat...
research
01/22/2018

Offline A/B testing for Recommender Systems

Before A/B testing online a new version of a recommender system, it is u...
research
03/11/2023

Uncertainty-Aware Off-Policy Learning

Off-policy learning, referring to the procedure of policy optimization w...
research
12/29/2022

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Off-Policy evaluation (OPE) is concerned with evaluating a new target po...
research
03/30/2021

Benchmarks for Deep Off-Policy Evaluation

Off-policy evaluation (OPE) holds the promise of being able to leverage ...

Please sign up or login with your details

Forgot password? Click here to reset