Off-Policy Evaluation in Partially Observable Environments

09/09/2019
by   Guy Tennenholtz, et al.
0

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to the procedure we provided for general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, importance sampling, and compare with our result on synthetic medical data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2021

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

We consider off-policy evaluation (OPE) in Partially Observable Markov D...
research
09/22/2021

A Spectral Approach to Off-Policy Evaluation for POMDPs

We consider off-policy evaluation (OPE) in Partially Observable Markov D...
research
05/14/2019

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

We introduce an off-policy evaluation procedure for highlighting episode...
research
12/10/2021

Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning

This paper proposes a new sequential model learning architecture to solv...
research
04/13/2023

CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments

Robots operating in real-world environments must reason about possible o...
research
01/13/2020

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Many medical decision-making settings can be framed as partially observe...
research
07/26/2022

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

We study off-policy evaluation (OPE) for partially observable MDPs (POMD...

Please sign up or login with your details

Forgot password? Click here to reset