Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

06/01/2023
by   Alizée Pace, et al.
0

A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previous...
research
12/01/2017

Causal inference taking into account unobserved confounding

Causal inference with observational data can be performed under an assum...
research
12/23/2022

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Motivated by the human-machine interaction such as training chatbots for...
research
03/20/2023

A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations

We study the offline contextual bandit problem, where we aim to acquire ...
research
03/05/2017

Controlling for Unobserved Confounds in Classification Using Correlational Constraints

As statistical classifiers become integrated into real-world application...
research
09/08/2023

Offline Recommender System Evaluation under Unobserved Confounding

Off-Policy Estimation (OPE) methods allow us to learn and evaluate decis...
research
04/13/2023

CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments

Robots operating in real-world environments must reason about possible o...

Please sign up or login with your details

Forgot password? Click here to reset