Reinforcement Learning with Exogenous States and Rewards

03/22/2023
by   George Trimponias, et al.
0

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous space, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.

READ FULL TEXT

page 24

page 36

research
06/05/2018

Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

Exogenous state variables and rewards can slow down reinforcement learni...
research
07/09/2021

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

The success of reinforcement learning in typical settings is, in part, p...
research
10/21/2017

Insulin Regimen ML-based control for T2DM patients

We model individual T2DM patient blood glucose level (BGL) by stochasti...
research
03/24/2023

Sequential Knockoffs for Variable Selection in Reinforcement Learning

In real-world applications of reinforcement learning, it is often challe...
research
02/02/2022

Optimizing Sequential Experimental Design with Deep Reinforcement Learning

Bayesian approaches developed to solve the optimal design of sequential ...
research
06/23/2020

Environment Shaping in Reinforcement Learning using State Abstraction

One of the central challenges faced by a reinforcement learning (RL) age...
research
04/02/2020

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Although in recent years reinforcement learning has become very popular ...

Please sign up or login with your details

Forgot password? Click here to reset