Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

04/13/2023
by   Wenli Xiao, et al.
0

Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

READ FULL TEXT

page 6

page 8

research
01/27/2021

Safe Multi-Agent Reinforcement Learning via Shielding

Multi-agent reinforcement learning (MARL) has been increasingly used in ...
research
07/27/2022

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

It is challenging to use reinforcement learning (RL) in cyber-physical s...
research
02/02/2021

An Abstraction-based Method to Check Multi-Agent Deep Reinforcement-Learning Behaviors

Multi-agent reinforcement learning (RL) often struggles to ensure the sa...
research
08/29/2017

Safe Reinforcement Learning via Shielding

Reinforcement learning algorithms discover policies that maximize reward...
research
01/28/2020

Towards Learning Multi-agent Negotiations via Self-Play

Making sophisticated, robust, and safe sequential decisions is at the he...
research
03/17/2022

Strategic Maneuver and Disruption with Reinforcement Learning Approaches for Multi-Agent Coordination

Reinforcement learning (RL) approaches can illuminate emergent behaviors...
research
01/20/2022

Safety-Aware Multi-Agent Apprenticeship Learning

Our objective of this project is to make the extension based on the tech...

Please sign up or login with your details

Forgot password? Click here to reset