Detecting Spiky Corruption in Markov Decision Processes

06/30/2019
by   Jason Mancuso, et al.
0

Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

On the connection between Bregman divergence and value in regularized Markov decision processes

In this short note we derive a relationship between the Bregman divergen...
research
06/29/2021

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

In this work we present a novel approach to hierarchical reinforcement l...
research
04/17/2002

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic envi...
research
04/03/2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, ca...
research
06/03/2021

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

This paper proposes a new reinforcement learning with hyperbolic discoun...
research
09/29/2018

Reinforcement Learning in R

Reinforcement learning refers to a group of methods from artificial inte...
research
06/27/2019

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

The honeynet is a promising active cyber defense mechanism. It reveals t...

Please sign up or login with your details

Forgot password? Click here to reset