Safe Policies for Reinforcement Learning via Primal-Dual Methods

11/20/2019
by   Santiago Paternain, et al.
0

In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We therefore consider a constrained MDP where the constraints are probabilistic. Since there is no straightforward way to optimize the policy with respect to the probabilistic constraint in a reinforcement learning framework, we propose an ergodic relaxation of the problem. The advantages of the proposed relaxation are threefold. (i) The safety guarantees are maintained in the case of episodic tasks and they are kept up to a given time horizon for continuing tasks. (ii) The constrained optimization problem despite its non-convexity has arbitrarily small duality gap if the parametrization of the policy is rich enough. (iii) The gradients of the Lagrangian associated with the safe-learning problem can be easily computed using standard policy gradient results and stochastic approximation tools. Leveraging these advantages, we establish that primal-dual algorithms are able to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.

READ FULL TEXT
research
06/04/2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the pr...
research
11/10/2022

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinfor...
research
02/25/2023

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

We study optimality for the safety-constrained Markov decision process w...
research
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...
research
06/04/2023

Resilient Constrained Learning

When deploying machine learning solutions, they must satisfy multiple re...
research
06/29/2023

Probabilistic Constraint for Safety-Critical Reinforcement Learning

In this paper, we consider the problem of learning safe policies for pro...
research
01/28/2022

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning (RL) aims to learn policies that satisfy cer...

Please sign up or login with your details

Forgot password? Click here to reset