Search and Explore: Symbiotic Policy Synthesis in POMDPs

05/23/2023
by   Roman Andriushchenko, et al.
0

This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller - that solely decides based on the observations seen so far - to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2022

Inductive Synthesis of Finite-State Controllers for POMDPs

We present a novel learning framework to obtain finite-state controllers...
research
10/24/2017

Permissive Finite-State Controllers of POMDPs using Parameter Synthesis

We study finite-state controllers (FSCs) for partially observable Markov...
research
01/12/2023

Safe Policy Improvement for POMDPs via Finite-State Controllers

We study safe policy improvement (SPI) for partially observable Markov d...
research
07/10/2023

Deductive Controller Synthesis for Probabilistic Hyperproperties

Probabilistic hyperproperties specify quantitative relations between the...
research
05/26/2023

MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints

We present MULTIGAIN 2.0, a major extension to the controller synthesis ...
research
11/08/2021

Gradient-Descent for Randomized Controllers under Partial Observability

Randomization is a powerful technique to create robust controllers, in p...
research
08/10/2023

Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception

Recently developed pretrained models can encode rich world knowledge exp...

Please sign up or login with your details

Forgot password? Click here to reset