Model-Free and Model-Based Policy Evaluation when Causality is Uncertain

04/02/2022
by   David Bruns-Smith, et al.
0

When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy. These "confounders" will introduce spurious correlations and naive estimates for a new policy will be biased. We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons when confounders are drawn iid each period. We demonstrate that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics. Finally, we show that when unobserved confounders are persistent over time, OPE is far more difficult and existing techniques produce extremely conservative bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2020

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

When observed decisions depend only on observed features, off-policy pol...
research
03/02/2023

Hallucinated Adversarial Control for Conservative Offline Policy Evaluation

We study the problem of conservative off-policy evaluation (COPE) where ...
research
02/11/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Off-policy evaluation of sequential decision policies from observational...
research
02/01/2023

Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders

Offline reinforcement learning is important in domains such as medicine,...
research
08/13/2022

Optimal Recovery for Causal Inference

It is crucial to successfully quantify causal effects of a policy interv...
research
05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
research
09/08/2023

Offline Recommender System Evaluation under Unobserved Confounding

Off-Policy Estimation (OPE) methods allow us to learn and evaluate decis...

Please sign up or login with your details

Forgot password? Click here to reset