Lecture Notes on Partially Known MDPs

12/06/2021
by   Guillermo A. Perez, et al.
0

In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

We present an efficient robust value iteration for -rectangular robust M...
research
11/03/2021

Online Learning in Adversarial MDPs: Is the Communicating Case Harder than Ergodic?

We study online learning in adversarial communicating Markov Decision Pr...
research
02/25/2021

Online Learning for Unknown Partially Observable MDPs

Solving Partially Observable Markov Decision Processes (POMDPs) is hard....
research
09/16/2021

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

In batch reinforcement learning, there can be poorly explored state-acti...
research
02/12/2021

Deep Reinforcement Learning for Backup Strategies against Adversaries

Many defensive measures in cyber security are still dominated by heurist...
research
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...
research
09/15/2021

Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

We investigate the problem of designing optimal stealthy poisoning attac...

Please sign up or login with your details

Forgot password? Click here to reset