Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

06/04/2021
∙
by   Tao Liu, et al.
∙
9
∙

We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of 𝒊Ėƒ(√(K)) while allowing an 𝒊Ėƒ(√(K)) constraint violation in K episodes. A critical question that arises is whether it is possible to keep the constraint violation even smaller. We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order 𝒊Ėƒ(√(K)). The algorithm which does so employs the principle of optimistic pessimism in the face of uncertainty to achieve safe exploration. When no strictly safe policy is known, though one is known to exist, then it is possible to restrict the system to bounded constraint violation with arbitrarily high probability. This is shown to be realized by a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 01/27/2023

Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

Constrained Markov decision processes (CMDPs) model scenarios of sequent...
research
∙ 11/20/2019

Safe Policies for Reinforcement Learning via Primal-Dual Methods

In this paper, we study the learning of safe policies in the setting of ...
research
∙ 09/13/2021

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to...
research
∙ 01/20/2022

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov...
research
∙ 01/28/2022

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

We consider primal-dual-based reinforcement learning (RL) in episodic co...
research
∙ 03/01/2020

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

We study the Safe Reinforcement Learning (SRL) problem using the Constra...
research
∙ 06/28/2022

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

While the primary goal of the exploration phase in reward-free reinforce...

Please sign up or login with your details

Forgot password? Click here to reset