Safe POMDP Online Planning via Shielding

09/19/2023
by   Shili Sheng, et al.
0

Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees that are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions violating almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2021

Lyapunov-based uncertainty-aware safe reinforcement learning

Reinforcement learning (RL) has shown a promising performance in learnin...
research
06/30/2020

Enforcing Almost-Sure Reachability in POMDPs

Partially-Observable Markov Decision Processes (POMDPs) are a well-known...
research
03/16/2023

Learning Logic Specifications for Soft Policy Guidance in POMCP

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver...
research
01/15/2014

Online Planning Algorithms for POMDPs

Partially Observable Markov Decision Processes (POMDPs) provide a rich f...
research
05/28/2020

Improving Automated Driving through Planning with Human Internal States

This work examines the hypothesis that partially observable Markov decis...
research
04/28/2021

Rule-based Shielding for Partially Observable Monte-Carlo Planning

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...
research
07/25/2017

Closed-Loop Policies for Operational Tests of Safety-Critical Systems

Manufacturers of safety-critical systems must make the case that their p...

Please sign up or login with your details

Forgot password? Click here to reset