Constrained Upper Confidence Reinforcement Learning

by   Liyuan Zheng, et al.

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret (O(T^3/4√(log(T/δ)))) with respect to the reward while satisfying the constraints even while learning with probability 1-δ. Illustrative examples are provided.


Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision p...

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...

Plug and Play, Model-Based Reinforcement Learning

Sample-efficient generalisation of reinforcement learning approaches hav...

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Inverse reinforcement learning (IRL) aims to estimate the reward functio...

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov...