Bayesian Verification under Model Uncertainty

by   Lenz Belzner, et al.
Universität München

Machine learning enables systems to build and update domain models based on runtime observations. In this paper, we study statistical model checking and runtime verification for systems with this ability. Two challenges arise: (1) Models built from limited runtime data yield uncertainty to be dealt with. (2) There is no definition of satisfaction w.r.t. uncertain hypotheses. We propose such a definition of subjective satisfaction based on recently introduced satisfaction functions. We also propose the BV algorithm as a Bayesian solution to runtime verification of subjective satisfaction under model uncertainty. BV provides user-definable stochastic bounds for type I and II errors. We discuss empirical results from an example application to illustrate our ideas.



There are no comments yet.


page 1

page 2

page 3

page 4


Stochastic Variational Smoothed Model Checking

Model-checking for parametric stochastic models can be expressed as chec...

COST Action IC 1402 ArVI: Runtime Verification Beyond Monitoring -- Activity Report of Working Group 1

This report presents the activities of the first working group of the CO...

From Model Checking to Runtime Verification and Back

We describe a novel approach for adapting an existing software model che...

Integrating Topological Proofs with Model Checking to Instrument Iterative Design

System development is not a linear, one-shot process. It proceeds throug...

Posterior predictive model checking using formal methods in a spatio-temporal model

We propose an interdisciplinary framework, Bayesian formal predictive mo...

CTL Model Update for System Modifications

Model checking is a promising technology, which has been applied for ver...

VeriFi: Model-Driven Runtime Verification Framework for Wireless Protocol Implementations

Validating wireless protocol implementations is challenging. Today's app...

Code Repositories


Distributed Autonomous Real-Time Systems

view repo


Bayesian Verification under Model Uncertainty

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Statistical approaches to model checking and runtime verification exploit a domain model in order to evaluate system properties at design and runtime [1]. The system simulates potential traces based on the domain model in order to establish some statistical guarantees about properties of interest.

Statistical verification is often based on a singular domain model [2, 3, 4]. Machine learning enables systems to build and adapt models about their application domains based on runtime observations (see e.g. [5, 6]

). In particular, Bayesian statistics generally allow to infer and reason about an infinite amount of models

[7]. Bayesian approaches allow to quantify the likelihood of a particular model, given prior beliefs and observed data. A system that verifies itself at runtime has to cope with model uncertainty to establish reliable verification results: Which hypothesis to assume when assessing system properties?

Model uncertainty induced by learning from limited runtime information also raises another issue: What does it exactly mean for a system to satisfy a particular property, given many model hypotheses and their respective plausibilities?

In this paper, we study statistical verification for systems that are able to build and update their models based on runtime information. The paper’s contributions are twofold:

  • We propose a definition of subjective satisfaction for systems that perform runtime verification based on limited information and possibly infinite hypothesis spaces.

  • We also propose a Bayesian verification algorithm, BV, that enables learning systems to decide on satisfaction or violation of required subjective satisfaction. BV provides user-definable stochastic bounds for type I and II errors (false negatives/positives). By construction, these error bounds are independent of the number of observations a system made about its environment.

  • We empirically establish the validity of BV’s error bounds on a toy example.

The paper is structured as follows. Section II

recaps Bayesian model checking and satisfaction functions for systems with parametrized models. In Section

III we introduce our definition of subjective satisfaction. In Section IV we discuss the Bayesian treatment of verification under model uncertainty with the BV algorithm. In Section V, we describe setup and results of an empirical evaluation of BV. We discuss related work in Section VI. Section VII concludes and discusses venues for further research.

Ii Preliminaries

This Section recalls Bayesian model checking and satisfaction functions for parametrized models.

Ii-a Bayesian Model Checking

Bayesian model checking (BMC) is based on Bayesian sequential hypothesis testing, and aims to infer the posterior distribution of the probability that a system satisfies its requirements

[3, 4]

. In contrast to point estimation (e.g. maximum likelihood), the Bayesian posterior captures the uncertainty about the true probability that arises from only performing a finite number of system assessments.

Requirements may be formally specified in a suitable probabilistic temporal logic [8, 9].

BMC treats a bounded simulation run of a system with a particular configuration as a Bernoulli experiment: The run may either satisfy or violate requirements. As the simulation captures probabilistic domain dynamics, the result of a simulation run is Bernoulli distributed with a probability


BMC infers a posterior distribution over based on the observed simulation results and a prior assumption about the distribution of . In general, the posterior is proportional to the likelihood of observed data (i.e. the results of simulation), multiplied by the prior distribution over the parameters of interest, in the case of BMC (Equation 1).


BMC models the uncertainty about

by a Beta distribution, the conjugate prior of the Bernoulli distribution. This approach ensures that the posterior is of the same form as the prior distribution, and thus enables efficient sequential updating of the distribution. The Beta distribution is parametrized by two parameters

. In the case of BMC, and are given by the successes and failures of the simulation runs. Given successes, failures, and assuming a uniform prior over , the posterior (for ) is determined by Equation 2.


Termination can be determined by assessing whether the probability mass above or below the required value of meets a particular confidence requirement . For alternative termination criteria, we refer to [4].

Ii-B Satisfaction Functions

Many modern systems operate with models of the environment that are stochastic and parametrized, e.g. models build by machine learning. Classical statistical model checking algorithms, including classical BMC, enable to assess requirement satisfaction for a single parametrization of the model. Recently, the satisfaction function was introduced as a concept to allow for efficient, regressive assessment of requirement satisfaction for parametrizable models with potentially infinitely many parameters [10]. At its core, the satisfaction function is defined as follows.


Here, denotes a boolean variable indicating requirement satisfaction or violation, and are the model parameters. The satisfaction probability is depending on the particular parametrization of the model. However, note that the definition of the satisfaction function does not make any assumptions about the distribution of the parameters themselves. We will now turn to combine estimations about the parameters and the satisfaction function in order to define what we label subjective satisfaction.

Iii Subjective Satisfaction

Consider a system that was able to make a limited number of observations about the dynamics of its environment. For example, consider a mobile agent whose moves may fail to have an effect with Bernoulli probability . The agent may observe whether its moves are effective or not. Consider a situation where the agent observed its moves 10 times, out of which two had no effect. The following questions naturally arise:

  • What is ?

  • How confident can the agent be in its estimate of ?

With these two questions in mind, consider now the situation that the agent finds itself in a grid world with obstacles at particular positions. Also, the agent has a sequence of movements to be executed in order to fulfill some given task, e.g. computed by a planning component. Consider that there is a requirement that the agent is only allowed to hit a limited number of obstacles (e.g. 2), with at most a specified probability . Another question arises:

  • What is the probability that the sequence of movements will satisfy the requirements, given the limited observations about ?

In this setup, an agent has to cope with various uncertainties:

  1. Domain uncertainty is inherent to the environment, in our example given by . It is aleatoric, therefore irreducible and originates from the physical setup of the domain (e.g. sensory abstraction, laws of physics, etc.). Note that domain uncertainty in combination with requirements uniquely defines a satisfaction function (cf. Section II-B).

  2. Model uncertainty is the epistemic uncertainty about the aleatoric domain uncertainty. It arises from the limited number of observations that the agent is able to collect from its environment. Note that model uncertainty not only arises from models learned at runtime. All empirically assessed models convey this kind of uncertainty, in particular all models built with machine learning approaches, regardless of the position in the a system’s development lifecycle.

  3. Subjective satisfaction, the uncertainty about a plan satisfying (or violating) a requirement in a particular situation, is also epistemic. It is a consequence of domain and model uncertainty, and the given system requirements.

The relation of domain and model uncertainty can be modeled in a Bayesian way. This is a widely adopted view, and a vast body of literature and techniques exists for estimating model uncertainty based on available domain observations [5, 7]. For readability, we write for in the remainder of the paper.

We now combine model uncertainty with the satisfaction function to define subjective satisfaction .


Subjective satisfaction can be interpreted as the parameter of a Bernoulli distribution that models uncertainty about satisfaction of the requirements. Intuitively, Eq. 4 weights the satisfaction probability for given parameters w.r.t. the the probability that these parameter represent the ground truth. Subjective satisfaction is considering all possible hypothetic domain parametrizations at once, and weights their respective satisfaction probabilities according to their plausibility (which is based on domain observations).

Iv Bayesian Verification under Model Uncertainty

We now define Bayesian Verification (BV), an algorithm for estimating subjective satisfaction by Monte Carlo simulation. By taking a Bayesian stance, we also get a confidence measure for this estimate. In fact, due to assessment of satisfaction with a limited number of simulations, an additional source of uncertainty arises: The uncertainty about the estimate of

. BV establishes and updates a probability distribution

to quantify this uncertainty, and uses it to decide on termination. BV takes the following inputs.

  • The current system state .

  • , the system’s model uncertainty.

  • A probabilistic simulation model of the domain dynamics , parametrized by . takes a state, a plan, a requirement and a parametrization, and yields a boolean variable indicating requirement satisfaction. I.e. this model implicitly provides the satisfaction function .

  • The system’s plan to be assessed.

  • A system requirement , e.g. a temporal logic formula.

  • A required probability of satisfying .

  • A required confidence in the estimate of .

BV is shown in Algorithm 1. BV first initializes it estimate of

. As satisfaction of a requirement in a stochastic domain can be interpreted as a Bernoulli random variable we use a uniform prior, which is a

distribution (line 2).

We define the confidence in the estimate of that is above the required satisfaction probability by determining the probability mass of above .


BV updates its estimate and uses it in order to decide whether the estimate of satisfaction (or violation) can be done with at least required confidence (cf. Equation 5). To this end, it performs the following steps in repetition.

  1. A sample parametrization is drawn from the model uncertainty (line 4).

  2. A simulation run is performed w.r.t. state, plan, requirement and parameters (line 5). Note that the simulation result is distributed accounting for both model uncertainty and satisfaction function, as the parameterization has been sampled from model uncertainty before. That is, at this point.

  3. The simulation result is used to update the belief distribution about (line 6).

  4. The probability mass of the belief distribution is used to determine whether satisfaction or violation have been assessed with at least required confidence (Eq. 5). If so, the algorithm terminates accordingly (lines 7 and 8).

1:procedure BV()
3:     loop
6:         update according to Eq. 2
7:         if  then return true          
8:         if  then return false               
Algorithm 1 The BV algorithm for Bayesian verification under model uncertainty.

V Empirical Results

We empirically assessed BV on a toy example. While we modeled a very simple example, it may be worth noting that in general the Bayesian approach to model uncertainty scales up to much larger models. There exist varied and powerful tools for sampling from complex, high-dimensional posteriors

, such as Markov Chain Monte Carlo (see e.g.

[11] for a very interesting read), or variational inference (e.g. [12]).

V-a Setup

The state is constituted by a 10 x 10 grid world, with the agent at position (0, 0). Obstacles are randomly positioned, at an obstacle to free position ratio of 0.2. The agent is presented a plan (an action sequence) of 10 movements (up, down, left, right, with obvious semantics). The agent has a Bernoulli action failure probability uniformly sampled from . Action failure results in the inverse movement (e.g. failing up yields down). The agent is presented a number of observations about its failure probability before running BV. We build model uncertainty about with a Beta distribution (cf. Eq. 2 and Section IV).

In our setting, is the requirement to hit less than three obstacles while executing the plan. We set

. This means we allow the agent to classify a plan as satisfying the requirement if it hits less than three obstacles in ninety percent of executions. We use a confidence requirement of


We approximate the ground truth satisfaction probability of a plan by taking the maximum likelihood estimate of satisfaction probability based on 10000 simulation runs. We assessed two error types.

  • A type I error is an incorrect rejection. This occurs if a plan

    satisfies with at least probability and is falsely rejected.

  • A type II error is a false accept. This occurs if a plan

    violates the requirements (i.e. is not satisfied with at least probability ) and is falsely accepted.

We also assessed a variant of BV that does not explicitly build model uncertainty from observations, but rather builds a corresponding maximum likelihood estimate ( observed failures / number of observations). Line 4 is correspondingly changed to in Algorithm 1.

Our implementation of the setup and BV is available at

V-B Results

Results are recorded for 10 randomly sampled observations of action failure probability. An exemplary result of our experiments is shown in Figure 1. The former shows accumulated type I errors over the course of different setups (i.e. randomly generated environments paired with random plans), the latter type II errors respectively. The dashed line shows the required statistical error bound (0.05 for ).

In particular, BV is able to establish the required statistical error bounds for both error types, while the MLE approach that is not explicitly using model uncertainty for inference fails to do so for type II errors. We observed this behavior for various numbers of observations presented to the system.

Fig. 1: Type I (left) and II (right) errors. X-axis shows number of tested situations (, ). Vertical axis shows accumulated number of type I and II errors.

Vi Related Work

BV is an instance of statistical model checking in general [2], and Bayesian statistical model checking in particular [3, 4]. Typically, these approaches are assuming a perfect available model, and do not deal with explicitly quantified epistemic model uncertainty. One of the starting points of the current article is the work on smoothed model checking [10]. SMC approximates a satisfaction function w.r.t. uncertain model parameters by Gaussian process regression. However, SMC does not incorporate distributions over model parameters for system assessment. Our definition of subjective satisfaction is a direct consequence of combining quantified model uncertainty with SMC’s satisfaction function. Parametrized Bayesian model checking for DBNs [13]

does deal with quantified model uncertainty. However, the author does not exploit the posterior for bounding or estimating errors. The algorithm terminates when the posterior variance “is less than some user-specified threshold”. This approach does not yield statistical error estimates or bounds. We argue that in the context of software engineering, quantifiable error guarantees or estimates play a key role for system assessment. A quite different approach to quantitative system assessment under model uncertainty is formal verification with confidence intervals (FACT)

[14]. It is based on (exhaustive) probabilistic model checking, and therefore allows to perform more thorough analysis than BV, which is approximate and (temporally) bounded. However, for the same reason, FACT suffers from the state space explosion. FACT models uncertainty in terms of frequentist confidence intervals, in contrast to BV’s Bayesian modeling approach.

Vii Conclusion

We have presented a Bayesian approach to statistical model checking under model uncertainty. We introduced the notion of subjective satisfaction as a result of combining recently introduced satisfaction functions with model uncertainty. We also presented Bayesian Verification (BV), an approximate Monte Carlo style algorithm for assessing subjective system satisfaction based on a simulation. BV allows for user-specified confidence bounds, and thus enables to statistically bound verification errors. We empirically evaluated BV on a toy example with positive results.

There are some limitations to the BV algorithm. When is close to , BV may take a many iterations to establish the required confidence. Note that this property is independent from the absolute value of . Similar to Bayesian model checking based on a fixed model, BV scales well with required satisfaction probabilities close to one (see e.g. [4]). BV’s obtained error bounds are statistical: They do not provide a hard upper bound. I.e. this bound may be surpassed temporarily when operating BV (e.g. an error may occur even when running BV only once, yielding an error rate of one). Also, while we could empirically observe that the error bound was not severely violated for our toy problem, there may be an intimate connection to the choice of prior for . To study the connection of prior and error bound would probably yield interesting directions for further research. Another limitation of BV is its boundedness in terms of search depth. To this end, it would be interesting to increase the quality of satisfaction estimates, for example by adding global, previously trained satisfaction estimators to BV.


The authors would like to thank Martin Wirsing and Matthias Hölzl for many inspiring discussions that led us into the direction of research presented in this paper.


  • [1] M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in Proc. 23rd International Conference on Computer Aided Verification (CAV’11), ser. LNCS, G. Gopalakrishnan and S. Qadeer, Eds., vol. 6806.   Springer, 2011, pp. 585–591.
  • [2] A. Legay, B. Delahaye, and S. Bensalem, “Statistical model checking: An overview,” in International Conference on Runtime Verification.   Springer, 2010, pp. 122–135.
  • [3] S. K. Jha, E. M. Clarke, C. J. Langmead, A. Legay, A. Platzer, and P. Zuliani, “A bayesian approach to model checking biological systems,” in International Conference on Computational Methods in Systems Biology.   Springer, 2009, pp. 218–234.
  • [4] P. Zuliani, A. Platzer, and E. M. Clarke, “Bayesian statistical model checking with application to simulink/stateflow verification,” in Proceedings of the 13th ACM international conference on Hybrid systems: computation and control.   ACM, 2010, pp. 243–252.
  • [5] D. J. MacKay, Information theory, inference and learning algorithms.   Cambridge university press, 2003.
  • [6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.   MIT Press, 2016,
  • [7] E. T. Jaynes, Probability theory: The logic of science.   Cambridge university press, 2003.
  • [8] A. Pnueli, “The temporal logic of programs,” in Foundations of Computer Science, 1977., 18th Annual Symposium on.   IEEE, 1977, pp. 46–57.
  • [9] C. Baier, J.-P. Katoen, and K. G. Larsen, Principles of model checking.   MIT press, 2008.
  • [10] L. Bortolussi, D. Milios, and G. Sanguinetti, “Smoothed model checking for uncertain continuous-time Markov chains,” Information and Computation, vol. 247, pp. 235–253, 2016.
  • [11] P. Diaconis, “The Markov chain Monte Carlo revolution,” Bulletin of the American Mathematical Society, vol. 46, no. 2, pp. 179–205, 2009.
  • [12] M. J. Wainwright, M. I. Jordan et al., “Graphical models, exponential families, and variational inference,” Foundations and Trends® in Machine Learning, vol. 1, no. 1–2, pp. 1–305, 2008.
  • [13]

    C. J. Langmead, “Generalized queries and bayesian statistical model checking in dynamic bayesian networks: Application to personalized medicine,” 2009.

  • [14] R. Calinescu, C. Ghezzi, K. Johnson, M. Pezzé, Y. Rafiq, and G. Tamburrelli, “Formal verification with confidence intervals to establish quality of service properties of software systems,” IEEE Transactions on Reliability, vol. 65, no. 1, pp. 107–125, 2016.