Markov Decision Processes with Continuous Side Information

11/15/2017
by   Aditya Modi, et al.
0

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm based on KWIK learning techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2019

Contextual Markov Decision Processes using Generalized Linear Models

We consider the recently proposed reinforcement learning (RL) framework ...
research
01/26/2022

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

Reward-free reinforcement learning (RL) considers the setting where the ...
research
10/03/2022

Square-root regret bounds for continuous-time episodic Markov decision processes

We study reinforcement learning for continuous-time Markov decision proc...
research
01/23/2019

Learning to Collaborate in Markov Decision Processes

We consider a two-agent MDP framework where agents repeatedly solve a ta...
research
11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...
research
02/12/2020

A Tensor Network Approach to Finite Markov Decision Processes

Tensor network (TN) techniques - often used in the context of quantum ma...
research
12/01/2016

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes

Due to physiological variation, patients diagnosed with the same conditi...

Please sign up or login with your details

Forgot password? Click here to reset