Theoretical and Experimental Comparison of Off-Policy Evaluation from Dependent Samples

10/08/2020
by   Masahiro Kato, et al.
0

We theoretically and experimentally compare estimators for off-policy evaluation (OPE) using dependent samples obtained via multi-armed bandit (MAB) algorithms. The goal of OPE is to evaluate a new policy using historical data. Because the MAB algorithms sequentially update the policy based on past observations, the generated samples are not independent and identically distributed. To conduct OPE from dependent samples, we need to use some techniques for constructing the estimator with asymptotic normality. In particular, we focus on a doubly robust (DR) estimator, which consists of an inverse probability weighting (IPW) component and an estimator of the conditionally expected outcome. We first summarize existing and new theoretical results for such OPE estimators. Then, we compare their empirical properties using benchmark datasets with other estimators, such as an estimator with cross-fitting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

The goal of off-policy evaluation (OPE) is to evaluate a new policy usin...
research
06/12/2020

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

This study addresses the problem of off-policy evaluation (OPE) from dep...
research
02/26/2020

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

We consider the evaluation and training of a new policy for the evaluati...
research
02/17/2021

Counterfactual Inference of the Mean Outcome under a Convergence of Average Logging Probability

Adaptive experiments, including efficient average treatment effect estim...
research
10/23/2020

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Off-policy evaluation (OPE) is the problem of estimating the value of a ...
research
02/13/2020

Adaptive Experimental Design for Efficient Treatment Effect Estimation: Randomized Allocation via Contextual Bandit Algorithm

Many scientific experiments have an interest in the estimation of the av...
research
08/17/2020

A Large-scale Open Dataset for Bandit Algorithms

We build and publicize the Open Bandit Dataset and Pipeline to facilitat...

Please sign up or login with your details

Forgot password? Click here to reset