Strengthening Deterministic Policies for POMDPs

07/16/2020
by   Leonore Winterer, et al.
0

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2017

Sensor Synthesis for POMDPs with Reachability Objectives

Partially observable Markov decision processes (POMDPs) are widely used ...
research
02/13/2020

Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Recurrent neural networks (RNNs) have emerged as an effective representa...
research
06/30/2017

Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

Markov decision processes (MDPs) are the standard formalism for modellin...
research
09/28/2018

The Partially Observable Games We Play for Cyber Deception

Progressively intricate cyber infiltration mechanisms have made conventi...
research
12/03/2020

Verifiable Planning in Expected Reward Multichain MDPs

The planning domain has experienced increased interest in the formal syn...
research
02/18/2022

A mixed-integer programming model for identifying intuitive ambulance dispatching policies

Markov decision process models and algorithms can be used to identify op...
research
01/06/2022

A Survey of JSON-compatible Binary Serialization Specifications

In this paper, we present the recent advances that highlight the charact...

Please sign up or login with your details

Forgot password? Click here to reset