State Relevance for Off-Policy Evaluation

09/13/2021
by   Simon P. Shen, et al.
0

Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.

READ FULL TEXT

page 7

page 9

page 17

research
06/04/2018

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

In reinforcement learning, off-policy evaluation is the task of using da...
research
12/07/2022

Low Variance Off-policy Evaluation with State-based Importance Sampling

In off-policy reinforcement learning, a behaviour policy performs explor...
research
10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...
research
10/21/2020

Optimal Off-Policy Evaluation from Multiple Logging Policies

We study off-policy evaluation (OPE) from multiple logging policies, eac...
research
05/07/2019

Multifidelity probability estimation via fusion of estimators

This paper develops a multifidelity method that enables estimation of fa...
research
10/20/2019

From Importance Sampling to Doubly Robust Policy Gradient

We show that policy gradient (PG) and its variance reduction variants ca...
research
01/10/2013

Policy Improvement for POMDPs Using Normalized Importance Sampling

We present a new method for estimating the expected return of a POMDP fr...

Please sign up or login with your details

Forgot password? Click here to reset