Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders

02/01/2023
by   David Bruns-Smith, et al.
0

Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy's action decisions are observed. This untestable assumption may be incorrect. We study robust policy evaluation and policy optimization in the presence of unobserved confounders. We assume the extent of possible unobserved confounding can be bounded by a sensitivity model, and that the unobserved confounders are sequentially exogenous. We propose and analyze an (orthogonalized) robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical improvements (reduced dependence on quantile estimation error) from orthogonalization. We provide sample complexity bounds, insights, and show effectiveness in simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2020

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

When observed decisions depend only on observed features, off-policy pol...
research
02/11/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Off-policy evaluation of sequential decision policies from observational...
research
09/08/2023

Offline Recommender System Evaluation under Unobserved Confounding

Off-Policy Estimation (OPE) methods allow us to learn and evaluate decis...
research
08/25/2020

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

This paper studies the robustness aspect of reinforcement learning algor...
research
04/02/2022

Model-Free and Model-Based Policy Evaluation when Causality is Uncertain

When decision-makers can directly intervene, policy evaluation algorithm...
research
05/22/2018

Confounding-Robust Policy Improvement

We study the problem of learning personalized decision policies from obs...
research
10/19/2021

Stateful Offline Contextual Policy Evaluation and Learning

We study off-policy evaluation and learning from sequential data in a st...

Please sign up or login with your details

Forgot password? Click here to reset