Active Learning of Markov Decision Processes using Baum-Welch algorithm (Extended)

10/06/2021
by   Giovanni Bacci, et al.
0

Cyber-physical systems (CPSs) are naturally modelled as reactive systems with nondeterministic and probabilistic dynamics. Model-based verification techniques have proved effective in the deployment of safety-critical CPSs. Central for a successful application of such techniques is the construction of an accurate formal model for the system. Manual construction can be a resource-demanding and error-prone process, thus motivating the design of automata learning algorithms to synthesise a system model from observed system behaviours. This paper revisits and adapts the classic Baum-Welch algorithm for learning Markov decision processes and Markov chains. For the case of MDPs, which typically demand more observations, we present a model-based active learning sampling strategy that choses examples which are most informative w.r.t. the current model hypothesis. We empirically compare our approach with state-of-the-art tools and demonstrate that the proposed active learning procedure can significantly reduce the number of observations required to obtain accurate models.

READ FULL TEXT
research
06/28/2019

L*-Based Learning of Markov Decision Processes (Extended Version)

Automata learning techniques automatically generate system models from t...
research
03/18/2022

An Overview of Modest Models and Tools for Real Stochastic Timed Systems

We depend on the safe, reliable, and timely operation of cyber-physical ...
research
12/20/2017

Temporal logic control of general Markov decision processes by approximate policy refinement

The formal verification and controller synthesis for Markov decision pro...
research
03/22/2023

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

A central task in control theory, artificial intelligence, and formal me...
research
04/22/2019

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

This work tackles the problem of robust zero-shot planning in non-statio...
research
11/05/2020

Mixed Nondeterministic-Probabilistic Interfaces

Interface theories are powerful frameworks supporting incremental and co...
research
07/03/2020

Active learning of timed automata with unobservable resets

Active learning of timed languages is concerned with the inference of ti...

Please sign up or login with your details

Forgot password? Click here to reset