Evaluating the Robustness of Off-Policy Evaluation

08/31/2021
by   Yuta Saito, et al.
0

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to be tuned, there is an emerging challenge for practitioners to select and tune OPE estimators for their specific application. Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies. Therefore, it is difficult to know which estimator is safe and reliable to use. In this work, we develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness to changes in hyperparameters and/or evaluation policies in an interpretable manner. Then, using the IEOE procedure, we perform extensive evaluation of a wide variety of existing estimators on Open Bandit Dataset, a large-scale public real-world dataset for OPE. We demonstrate that our procedure can evaluate the estimators' robustness to the hyperparamter choice, helping us avoid using unsafe estimators. Finally, we apply IEOE to real-world e-commerce platform data and demonstrate how to use our protocol in practice.

READ FULL TEXT
research
09/17/2021

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the ...
research
03/30/2021

Benchmarks for Deep Off-Policy Evaluation

Off-policy evaluation (OPE) holds the promise of being able to leverage ...
research
08/17/2020

A Large-scale Open Dataset for Bandit Algorithms

We build and publicize the Open Bandit Dataset and Pipeline to facilitat...
research
11/25/2022

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance ...
research
11/06/2018

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

The ability to perform offline A/B-testing and off-policy learning using...
research
06/15/2021

Control Variates for Slate Off-Policy Evaluation

We study the problem of off-policy evaluation from batched contextual ba...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...

Please sign up or login with your details

Forgot password? Click here to reset