A comparative study of counterfactual estimators

04/03/2017
by   Thomas Nedelec, et al.
0

We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal. We then exhibit properties optimal estimators should possess. In the case where examples have been gathered using multiple policies, we show that fused estimators dominate basic ones but can still be improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

In reinforcement learning, off-policy evaluation is the task of using da...
research
01/22/2018

Offline A/B testing for Recommender Systems

Before A/B testing online a new version of a recommender system, it is u...
research
10/21/2020

Optimal Off-Policy Evaluation from Multiple Logging Policies

We study off-policy evaluation (OPE) from multiple logging policies, eac...
research
05/07/2019

Multifidelity probability estimation via fusion of estimators

This paper develops a multifidelity method that enables estimation of fa...
research
06/09/2019

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Off-policy evaluation (OPE) in both contextual bandits and reinforcement...
research
04/22/2020

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Counterfactual reasoning from logged data has become increasingly import...
research
01/01/2022

Challenges of sampling and how phylogenetic comparative methods help: With a case study of the Pama-Nyungan laminal contrast

Phylogenetic comparative methods are new in our field and are shrouded, ...

Please sign up or login with your details

Forgot password? Click here to reset