Optimal Off-Policy Evaluation from Multiple Logging Policies

10/21/2020
by   Nathan Kallus, et al.
0

We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. Previous work noted that in this setting the ordering of the variances of different importance sampling estimators is instance-dependent, which brings up a dilemma as to which importance sampling weights to use. In this paper, we resolve this dilemma by finding the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one. In particular, we establish the efficiency bound under stratified sampling and propose an estimator achieving this bound when given consistent q-estimates. To guard against misspecification of q-functions, we also provide a way to choose the control variate in a hypothesis class to minimize variance. Extensive experiments demonstrate the benefits of our methods' efficiently leveraging of the stratified sampling of off-policy data from multiple loggers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

State Relevance for Off-Policy Evaluation

Importance sampling-based estimators for off-policy evaluation (OPE) are...
research
10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...
research
07/26/2017

Notes on optimal approximations for importance sampling

In this manuscript, we derive optimal conditions for building function a...
research
04/03/2017

A comparative study of counterfactual estimators

We provide a comparative study of several widely used off-policy estimat...
research
03/02/2017

In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling

Entity resolution (ER) presents unique challenges for evaluation methodo...
research
10/20/2019

Amortized Rejection Sampling in Universal Probabilistic Programming

Existing approaches to amortized inference in probabilistic programs wit...
research
01/10/2013

Policy Improvement for POMDPs Using Normalized Importance Sampling

We present a new method for estimating the expected return of a POMDP fr...

Please sign up or login with your details

Forgot password? Click here to reset