Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

05/14/2019
by   Michael Oberst, et al.
5

We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs). We see this as a useful procedure for off-policy "debugging" in high-risk settings (e.g., healthcare); by decomposing the expected difference in reward between the RL and observed policy into specific episodes, we can identify episodes where the counterfactual difference in reward is most dramatic. This in turn can be used to facilitate review of specific episodes by domain experts. We demonstrate the utility of this procedure with a synthetic environment of sepsis management.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Off-Policy Evaluation in Partially Observable Environments

This work studies the problem of batch off-policy evaluation for Reinfor...
research
12/16/2022

Towards Causal Temporal Reasoning for Markov Decision Processes

We introduce a new probabilistic temporal logic for the verification of ...
research
01/01/2013

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

In this paper we extend temporal difference policy evaluation algorithms...
research
06/20/2020

Counterfactually Guided Policy Transfer in Clinical Settings

Reliably transferring treatment policies learned in one clinical environ...
research
02/07/2019

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

We address a practical problem ubiquitous in modern industry, in which a...
research
01/20/2022

Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making

Assessing the effects of a policy based on observational data from a dif...
research
04/01/2022

Model-agnostic Counterfactual Synthesis Policy for Interactive Recommendation

Interactive recommendation is able to learn from the interactive process...

Please sign up or login with your details

Forgot password? Click here to reset