Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

09/21/2022
by   Rui Miao, et al.
0

We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

A Policy Gradient Method for Confounded POMDPs

In this paper, we propose a policy gradient method for confounded partia...
research
11/12/2021

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

We consider off-policy evaluation (OPE) in Partially Observable Markov D...
research
10/19/2021

Stateful Offline Contextual Policy Evaluation and Learning

We study off-policy evaluation and learning from sequential data in a st...
research
10/28/2021

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data,...
research
05/14/2019

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluatio...
research
09/22/2021

A Spectral Approach to Off-Policy Evaluation for POMDPs

We consider off-policy evaluation (OPE) in Partially Observable Markov D...
research
07/26/2022

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

We study off-policy evaluation (OPE) for partially observable MDPs (POMD...

Please sign up or login with your details

Forgot password? Click here to reset