A Practical Guide of Off-Policy Evaluation for Bandit Problems

10/23/2020
by   Masahiro Kato, et al.
13

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

The goal of off-policy evaluation (OPE) is to evaluate a new policy usin...
research
06/12/2020

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

This study addresses the problem of off-policy evaluation (OPE) from dep...
research
10/08/2020

Theoretical and Experimental Comparison of Off-Policy Evaluation from Dependent Samples

We theoretically and experimentally compare estimators for off-policy ev...
research
09/30/2021

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Although the classical version of the Multi-Armed Bandits (MAB) framewor...
research
02/17/2020

Differentiable Bandit Exploration

We learn bandit policies that maximize the average reward over bandit in...
research
01/05/2021

Off-Policy Evaluation of Slate Policies under Bayes Risk

We study the problem of off-policy evaluation for slate bandits, for the...
research
07/03/2021

Supervised Off-Policy Ranking

Off-policy evaluation (OPE) leverages data generated by other policies t...

Please sign up or login with your details

Forgot password? Click here to reset