Rule-based Shielding for Partially Observable Monte-Carlo Planning

04/28/2021
by   Giulio Mazzi, et al.
0

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i.e., sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by the expert. The second contribution is a module that uses online the logical formulas to identify anomalous actions selected by POMCP and substitutes those actions with actions that satisfy the logical formulas fulfilling expert knowledge. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation. Results show that the shielded POMCP outperforms the standard POMCP in a case study in which a wrong parameter of POMCP makes it select wrong actions from time to time. Moreover, we show that the approach keeps good performance also if the parameters of the logical formula are optimized using trajectories containing some wrong actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2020

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...
research
07/30/2021

Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

Restless multi-armed bandits with partially observable states has applic...
research
03/16/2023

Learning Logic Specifications for Soft Policy Guidance in POMCP

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver...
research
09/19/2023

Safe POMDP Online Planning via Shielding

Partially observable Markov decision processes (POMDPs) have been widely...
research
01/16/2014

Efficient Planning under Uncertainty with Macro-actions

Deciding how to act in partially observable environments remains an acti...
research
06/27/2012

Monte Carlo Bayesian Reinforcement Learning

Bayesian reinforcement learning (BRL) encodes prior knowledge of the wor...
research
02/05/2018

Interactive Robot Transition Repair With SMT

Complex robot behaviors are often structured as state machines, where st...

Please sign up or login with your details

Forgot password? Click here to reset