Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

09/18/2022
by   Zuyue Fu, et al.
5

We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Offline Reinforcement Learning with Additional Covering Distributions

We study learning optimal policies from a logged dataset, i.e., offline ...
research
12/23/2022

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Motivated by the human-machine interaction such as training chatbots for...
research
12/30/2020

Is Pessimism Provably Efficient for Offline RL?

We study offline reinforcement learning (RL), which aims to learn an opt...
research
03/17/2022

Semi-Markov Offline Reinforcement Learning for Healthcare

Reinforcement learning (RL) tasks are typically framed as Markov Decisio...
research
09/10/2021

Projected State-action Balancing Weights for Offline Reinforcement Learning

Offline policy evaluation (OPE) is considered a fundamental and challeng...
research
11/01/2022

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Offline reinforcement learning (RL), which refers to decision-making fro...
research
02/19/2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

In offline reinforcement learning (RL) an optimal policy is learnt solel...

Please sign up or login with your details

Forgot password? Click here to reset