Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation

12/16/2020
by   Chaochao Lu, et al.
20

Reinforcement learning (RL) algorithms usually require a substantial amount of interaction data and perform well only for specific tasks in a fixed environment. In some scenarios such as healthcare, however, usually only few records are available for each patient, and patients may show different responses to the same treatment, impeding the application of current RL algorithms to learn optimal policies. To address the issues of mechanism heterogeneity and related data scarcity, we propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics, which are estimated by leveraging both commonalities and differences across subjects. The learned SCM enables us to counterfactually reason what would have happened had another treatment been taken. It helps avoid real (possibly risky) exploration and mitigates the issue that limited experiences lead to biased policies. We propose counterfactual RL algorithms to learn both population-level and individual-level policies. We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

MoCoDA: Model-based Counterfactual Data Augmentation

The number of states in a dynamic process is exponential in the number o...
research
07/06/2020

Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control an...
research
04/27/2022

Counterfactual harm

To act safely and ethically in the real world, agents must be able to re...
research
07/25/2023

Counterfactual Explanation Policies in RL

As Reinforcement Learning (RL) agents are increasingly employed in diver...
research
11/15/2018

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Learning policies on data synthesized by models can in principle quench ...
research
05/19/2022

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Despite intense efforts in basic and clinical research, an individualize...
research
09/19/2021

Dual Behavior Regularized Reinforcement Learning

Reinforcement learning has been shown to perform a range of complex task...

Please sign up or login with your details

Forgot password? Click here to reset