Constrained Upper Confidence Reinforcement Learning

01/26/2020
by   Liyuan Zheng, et al.
0

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret (O(T^3/4√(log(T/δ)))) with respect to the reward while satisfying the constraints even while learning with probability 1-δ. Illustrative examples are provided.

READ FULL TEXT
05/23/2022

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision p...
04/20/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...
08/20/2021

Plug and Play, Model-Based Reinforcement Learning

Sample-efficient generalisation of reinforcement learning approaches hav...
06/20/2020

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Inverse reinforcement learning (IRL) aims to estimate the reward functio...
03/23/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...
01/20/2022

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov...