Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

02/13/2020
by   Steven Carr, et al.
0

Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2019

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

We study strategy synthesis for partially observable Markov decision pro...
research
11/29/2018

Learning Finite State Representations of Recurrent Policy Networks

Recurrent neural networks (RNNs) are an effective representation of cont...
research
06/12/2020

A Formal Language Approach to Explaining RNNs

This paper presents LEXR, a framework for explaining the decision making...
research
07/16/2020

Strengthening Deterministic Policies for POMDPs

The synthesis problem for partially observable Markov decision processes...
research
02/27/2018

Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes

We study planning problems where autonomous agents operate inside enviro...
research
03/16/2019

Secure Control under Partial Observability with Temporal Logic Constraints

This paper studies the synthesis of control policies for an agent that h...
research
07/16/2018

Shielded Decision-Making in MDPs

A prominent problem in artificial intelligence and machine learning is t...

Please sign up or login with your details

Forgot password? Click here to reset