
Upper Confidence PrimalDual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions
We consider online learning for episodic Markov decision processes (MDPs...
read it

Detecting Spiky Corruption in Markov Decision Processes
Current reinforcement learning methods fail if the reward function is im...
read it

Tightening Exploration in Upper Confidence Reinforcement Learning
The upper confidence reinforcement learning (UCRL2) strategy introduced ...
read it

ChanceConstrained Control with Lexicographic Deep Reinforcement Learning
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)...
read it

Constrained episodic reinforcement learning in concaveconvex and knapsack settings
We propose an algorithm for tabular episodic reinforcement learning with...
read it

ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
read it

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms
Inverse reinforcement learning (IRL) aims to estimate the reward functio...
read it
Constrained Upper Confidence Reinforcement Learning
Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is wellmotivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm CUCRL and show that it achieves sublinear regret (O(T^3/4√(log(T/δ)))) with respect to the reward while satisfying the constraints even while learning with probability 1δ. Illustrative examples are provided.
READ FULL TEXT
Comments
There are no comments yet.