Safe Reinforcement Learning via Probabilistic Logic Shields

03/06/2023
by   Wen-Chi Yang, et al.
0

Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.

READ FULL TEXT

page 5

page 10

page 11

page 15

research
12/27/2021

Safe Reinforcement Learning with Chance-constrained Model Predictive Control

Real-world reinforcement learning (RL) problems often demand that agents...
research
07/04/2022

Safe Reinforcement Learning via Confidence-Based Filters

Ensuring safety is a crucial challenge when deploying reinforcement lear...
research
08/29/2017

Safe Reinforcement Learning via Shielding

Reinforcement learning algorithms discover policies that maximize reward...
research
06/24/2020

DISK: Learning local features with policy gradient

Local feature frameworks are difficult to learn in an end-to-end fashion...
research
12/12/2022

Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic

Reinforcement Learning (RL) can solve complex tasks but does not intrins...
research
10/02/2022

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the contex...
research
04/02/2020

Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

For all its successes, Reinforcement Learning (RL) still struggles to de...

Please sign up or login with your details

Forgot password? Click here to reset