-
A Geometric Traversal Algorithm for Reward-Uncertain MDPs
Markov decision processes (MDPs) are widely used in modeling decision ma...
read it
-
CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq
Reinforcement learning algorithms solve sequential decision-making probl...
read it
-
Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures
The problem of making sequential decisions in unknown probabilistic envi...
read it
-
Renewal Monte Carlo: Renewal theory based reinforcement learning
In this paper, we present an online reinforcement learning algorithm, ca...
read it
-
Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes
The honeynet is a promising active cyber defense mechanism. It reveals t...
read it
-
Reinforcement Learning in R
Reinforcement learning refers to a group of methods from artificial inte...
read it
-
On the Correctness and Sample Complexity of Inverse Reinforcement Learning
Inverse reinforcement learning (IRL) is the problem of finding a reward ...
read it
Detecting Spiky Corruption in Markov Decision Processes
Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption.
READ FULL TEXT
Comments
There are no comments yet.