Deconfounding Reinforcement Learning in Observational Settings

12/26/2018
by   Chaochao Lu, et al.
10

We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing full RL problems with observational data. Code is available at https://github.com/CausalRL/DRL.

READ FULL TEXT

page 15

page 16

research
04/18/2023

Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

This study presents a benchmark for evaluating action-constrained reinfo...
research
05/08/2022

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

Actor-critic Reinforcement Learning (RL) algorithms have achieved impres...
research
11/11/2019

Real-Time Reinforcement Learning

Markov Decision Processes (MDPs), the mathematical framework underlying ...
research
09/09/2020

DyNODE: Neural Ordinary Differential Equations for Dynamics Modeling in Continuous Control

We present a novel approach (DyNODE) that captures the underlying dynami...
research
12/02/2021

Editing a classifier by rewriting its prediction rules

We present a methodology for modifying the behavior of a classifier by d...
research
01/09/2020

Identifying Distinct, Effective Treatments for Acute Hypotension with SODA-RL: Safely Optimized Diverse Accurate Reinforcement Learning

Hypotension in critical care settings is a life-threatening emergency th...
research
11/28/2022

Causal Deep Reinforcement Learning using Observational Data

Deep reinforcement learning (DRL) requires the collection of plenty of i...

Please sign up or login with your details

Forgot password? Click here to reset