Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

02/22/2022
by   Chengchun Shi, et al.
0

This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2022

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a t...
research
05/10/2021

Deeply-Debiased Off-Policy Interval Estimation

Off-policy evaluation learns a target policy's value with a historical d...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...
research
07/27/2020

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important pr...
research
11/09/2020

Robust Batch Policy Learning in Markov Decision Processes

We study the sequential decision making problem in Markov decision proce...
research
02/19/2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

In offline reinforcement learning (RL) an optimal policy is learnt solel...
research
08/01/2019

Mapping the uncertainty of 19th century West African slave origins using a Markov decision process model

The advent of modern computers has added an increased emphasis on channe...

Please sign up or login with your details

Forgot password? Click here to reset