Causal Markov Decision Processes: Learning Good Interventions Efficiently

02/15/2021
by   Yangyi Lu, et al.
0

We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an Õ(HS√(ZT)) regret bound, where T is the the total time steps, H is the episodic horizon, and S is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions (A), but only scales with a causal graph dependent quantity Z which can be exponentially smaller than A. By extending C-UCBVI to the factored MDP setting, we propose the causal factored UCBVI (CF-UCBVI) algorithm, which further reduces the regret exponentially in terms of S. Furthermore, we show that RL algorithms for linear MDP problems can also be incorporated in C-MDPs. We empirically show the benefit of our causal approaches in various settings to validate our algorithms and theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/01/2021

Intervention Efficient Algorithm for Two-Stage Causal MDPs

We study Markov Decision Processes (MDP) wherein states correspond to ca...
12/11/2018

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

We introduce and analyse two algorithms for exploration-exploitation in ...
05/25/2019

Large Scale Markov Decision Processes with Changing Rewards

We consider Markov Decision Processes (MDPs) where the rewards are unkno...
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...
06/20/2018

RUDDER: Return Decomposition for Delayed Rewards

We propose a novel reinforcement learning approach for finite Markov dec...
06/15/2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online reinforcement learning (RL) has been widely applied in informatio...
10/09/2019

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Leveraging an equivalence property in the state-space of a Markov Decisi...