
A Geometric Traversal Algorithm for RewardUncertain MDPs
Markov decision processes (MDPs) are widely used in modeling decision ma...
read it

CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq
Reinforcement learning algorithms solve sequential decisionmaking probl...
read it

SelfOptimizing and ParetoOptimal Policies in General Environments based on BayesMixtures
The problem of making sequential decisions in unknown probabilistic envi...
read it

Renewal Monte Carlo: Renewal theory based reinforcement learning
In this paper, we present an online reinforcement learning algorithm, ca...
read it

Adaptive Honeypot Engagement through Reinforcement Learning of SemiMarkov Decision Processes
The honeynet is a promising active cyber defense mechanism. It reveals t...
read it

Reinforcement Learning in R
Reinforcement learning refers to a group of methods from artificial inte...
read it

On the Correctness and Sample Complexity of Inverse Reinforcement Learning
Inverse reinforcement learning (IRL) is the problem of finding a reward ...
read it
Detecting Spiky Corruption in Markov Decision Processes
Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption.
READ FULL TEXT
Comments
There are no comments yet.