An evaluation of estimation techniques for probabilistic reachability

04/09/2018 ∙ by Mariia Vasileva, et al. ∙ Newcastle University 0

We evaluate numerically-precise Monte Carlo (MC), Quasi-Monte Carlo (QMC) and Randomised Quasi-Monte Carlo (RQMC) methods for computing probabilistic reachability in hybrid systems with random parameters. Computing reachability probability amounts to computing (multidimensional) integrals. In particular, we pay attention to QMC methods due to their theoretical benefits in convergence speed with respect to the MC method. The Koksma-Hlawka inequality is a standard result that bounds the approximation of an integral by QMC techniques. However, it is not useful in practice because it depends on the variation of the integrand function, which is in general difficult to compute. The question arises whether it is possible to apply statistical or empirical methods for estimating the approximation error. In this paper we compare a number of interval estimation techniques based on the Central Limit Theorem (CLT), and we also introduce a new approach based on the CLT for computing confidence intervals for probability near the borders of the [0,1] interval. Based on our analysis, we provide justification for the use of the developed approach and suggest usage guidelines for probability estimation techniques.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reachability is one of the fundamental problems in verification and model checking. Given a system model and a set of goal states (indicating (un)wanted behaviour), does the system eventually reach these states? The generalisation of this problem for stochastic systems is called probabilistic reachability, and it amounts to compute the probability that the system reaches a goal state.

Checking reachability in hybrid systems is an undecidable problem for all but the simplest systems (timed automata) [2]. Formal verification of hybrid systems can include verifying satisfiability of formulas involving real variables, which is known to be an undecidable problem when, e.g., trigonometric functions are involved [22]. In order to combat the undecidability of general sentences over the reals, Gao, Avigad and Clarke defined the notion of -complete decision procedure [12]. This approach has been extended to a bounded probabilistic reachability method with statistically valid enclosures for the probability that a hybrid system can reach a goal state within a given time bound and number of steps [20]. In particular, we consider the -step reachability probability for parametric hybrid systems with random parameters. The ProbReach tool [19] implements such a method and computes under- and over- approximation of the reachability probability, which amounts to computing multi-dimensional integrals. There are three possible ways to compute such integrals - formal, Monte-Carlo (MC) and Quasi-Monte Carlo (QMC). The number of system evolutions to explore in order to accurately compute integrals grows exponentially with respect to the number of dimensions [22]. This motivates a combination of MC and QMC methods and numerical decision procedures in order to define efficient, numerically accurate estimation techniques.

It is well-known that MC methods are based on the Law of Large Numbers and random sampling. Instead, QMC methods are based on

deterministic sampling from so-called quasi-random sequences. The error estimation of the QMC method can be computed theoretically by the Koksma-Hlawka inequality. Unfortunately, in practice its use is connected with a number of calculation difficulties [15]. The terms of quasi-random sequences are statistically dependent, so the Central Limit Theorem (CLT) can not be used for estimating the integration error. At the same time we can successfully use the CLT for estimating the error of Randomised Quasi-Monte Carlo (RQMC) methods.

The aim of this paper is to compare different interval estimation techniques, in particular in the extreme cases of probability close to 0 or 1. A problem of many confidence interval (CI) techniques is that the actual coverage probability of the interval near the boundaries (0 and 1) can be poor [18, 6].

The paper is structured as follows: In Section 2, we briefly introduce probabilistic reachability for stochastic parametric hybrid systems. In Section 3, we present integral estimation methods including MC, QMC and RQMC. Additionally, we consider a recent approach to QMC variance calculation for statistical error estimation. In Section 4, we examine CI error estimation with approaches based on the standard CLT interval and on the

function. In Section 5, we empirically compare those CI estimation techniques on five benchmarks, and derive usage guidelines. In Section 6, we provide conclusions and suggest future work in the area.

2 Probabilistic Reachability

Hybrid systems provide a framework for modelling real-world systems that combine continuous and discrete dynamics [2]. In particular parametric hybrid systems (PHSs) [20] represent continuous and discrete dynamic behavior dependent on initial parameters, which remain unchanged during the system evolution. Such systems can both flow, described by a differential equation and jump, described by difference equations or control graphs.

In order to combat the undecidability of reasoning over real sentences Gao et al. [12] defined -complete decision procedures, which correctly decide whether a slightly relaxed sentence is satisfiable or not. Let be a constant and a bounded -sentence in the standard form: , where the are compositions of Type 2-computable functions (these are essentially numerically computable real functions, including transcendental functions and solutions of differential equations). The -weakening of is the formula: . The bounded -SMT problem asks for the following: given a sentence of the above form and , correctly decide one of the following:

  • unsat: is false,

  • -true: is true.

If the two cases overlap either decision can be returned. Standard bounded reachability questions over PHSs can be coded as sentences and -decided by -complete decision procedures [14].

In this paper we are concerned with stochastic PHSs, which introduce random parameters to an otherwise deterministic PHS. Bounded -step reachability in PHSs aims at finding the probability that for the given initial conditions, the system reaches a goal state in discrete transitions within a given finite time. It can be shown that this probability can be computed as an integral of the form , where denotes the set of all random parameter values for which the system reaches a goal state in steps, and is the probability measure associated with the random parameters [20].

3 Integral Estimation Methods

3.1 Monte Carlo Method

Consider the integral

, and a random variable

on . The expectation of is:

where is the density of . If

is uniformly distributed on

, then the integral becomes:

If we take points, uniformly distributed on , and compute the sample mean , we obtain the MC integral estimation:

(1)

According to the Strong Law of Large Numbers, this approximation is convergent (for ) to with probability one. The variance of the MC estimator (1) is:

(2)

The MC integration error mean is , where is the integrand variance, which is assumed to exist. In practice, the integrand variance is often unknown. That is why the next estimation is instead used:

This estimator possesses the unbiasedness property: .

3.2 Quasi-Monte Carlo Method

QMC methods can be regarded as a deterministic counterpart to classical MC methods. Unlike MC integration, which uses estimates (1) with randomly selected points, QMC methods select the points deterministically. Specifically, QMC techniques produce deterministic sequences of points that provide the best-possible spread over the integration domain. These deterministic sequences are often referred to as low-discrepancy sequences. The Sobol sequence [21] is a well-known example of low-discrepancy sequence. In Figure 1, we present a simple example of the comparison between Sobol and pseudorandom points distribution. An effective way to use the QMC method is by performing a change of variables to reduce integration to the domain. When we need to integrate over a large domain , that avoids multiplying the QMC estimate by a large factor as required by (1).

A QMC advantage with respect to MC is that its error is , while the MC error is , where is the sample size. The Koksma-Hlawka inequality bounds the error of QMC estimates, but in practical applications it is very hard to estimate [15], thereby hampering the use of QMC methods. As such, other methods for estimating the QMC error need to be developed.

Figure 1: Sobol sequence, uniform pseudorandom and randomised Sobol sequence points (obtained by transformation , where is a random sample from MC sequence and is low-discrepancy sample from Sobol sequence) distribution in the 2-dimensional unit space. The comparison is based on the first 300 points of sequences.

Qint Cubature Method.

Ermakov and Antonov [5] have recently introduced a new method for QMC variance estimation. To construct an estimate of the integral they use the set of random quadrature formulas, which were introduced by the Ermakov-Granovsky theorem [11]. This theorem allows us to construct -point formulas with two important properties: the unbiasedness property for integral and the accuracy property for the considered Haar system. The nodes of the formula are random variables with distribution density:

where is a Latin set that relates to the permutation and can be defined by the next condition:

where is a set of permuted orthonormal Haar functions [5].

The variance of the constructed cubature formula can be calculated as:

where is the variance of MC method (2) and for .

In other words, we can redefine the integral estimation variance as:

(3)

3.3 Randomised Quasi-Monte Carlo

As discussed earlier, the practical application of QMC is limited by the difficulty of computing an estimate of the integration error. However, allowing randomization into the deterministic QMC procedure enables constructing CIs. A Randomised QMC (RQMC) procedure can be described as follows. Suppose that is a deterministic low-discrepancy set. By means of a transformation a finite set is generated by the random variable and has the same quasi-random properties as set (see Figure 1). For a randomised set we construct a RQMC estimate similar to (1):

(4)

for , where is the total number of different pseudo-random sequences. Then, we take their average for overall RQMC estimation (4):

(5)

If we choose the transformation in such a way that each of the estimates has the unbiasedness property, i.e., , (e.g. mod ), then the estimator (5) will also be unbiased, i.e., . By independence of the samples used in (4) and (5), we have that for all :

Thus, we have the following variance estimation:

3.4 Validated MC and QMC

Let us consider a hybrid system with random parameters only. For any parameter value from the initial parameters distribution we introduce the Bernoulli random variable , which takes 1 if system reaches the goal in steps for and 0 otherwise. Since in general we can not sample because of undecidability, we instead consider two Bernoulli random variables: , which takes 1 if we can correctly decide that system reaches the goal in steps for and 0 otherwise; , which takes 0 if we can correctly decide that system does not reach the goal and 1 otherwise [20]. Therefore:

and thus:

By the definition of expectation, and denoting as the domain of the random parameters of , we get:

(6)

We take the sample approximation of (6) and obtain

where the ’s can be sampled by using low-discrepancy sequences for QMC methods or pseudo-random sequences for MC methods.

4 Confidence Interval Estimation and Error Analysis

In the following we shall use the notation below:
- the sample mean.

- the inverse cumulative distribution function of a normal random variable with

l

mean 0 and standard deviation 1; parameter

defines the confidence level at .

- the binomially-distributed proportion, where:

- the number of successes and - the number of failures in Bernoulli trial process; - the total number of Bernoulli trials.
.

4.1 Intervals Based on the Standard CLT Interval

Modified Central Limit Theorem interval.

First, we consider the case when the sample

is extracted from the normal distribution

with unknown parameter and known , where is the mean or expectation of the distribution and is the variance. Here, can be approximated by the sample mean: . To clarify this approximation, we construct a CI covering the parameter with a given confidence probability:

(7)

If the variance is unknown, we can use the same CI by replacing with the sample standard deviation . This method is widely used for estimating the distribution of the error regarding the binomially-distributed proportions. Many related works [6, 7, 8] note that the approximation can be poor when applied to Bernoulli trials with close to 0 or 1.

In order to resolve this problem, we introduce a new method for variance estimation, which uses a sequential estimation of the sample standard deviation and calculates (7) at every new sample. Our solution simply approximates the sample standard deviation with at the initial stages of the computation if is equal to 0 (or 1) and propagates it through the computation until the necessary number of samples to construct the interval are obtained. We show the advantages of this approach in Section 5.

Wilson interval.

It was introduced by Wilson in 1927 in his fundamental work [9]

and uses the inversion of the CLT interval. Additionally, it involves a modified center by quantile formula mean value. The interval has the following form:

(8)

This interval has some obvious advantages - it can not exceed the probability boundaries, and it can be easily calculated even if is 0 or 1. At the same time, has downward spikes when is close to 0 and 1, because it is formed by an inverted CLT approximation.

Agresti-Coull interval.

This method was introduced by Agresti and Coull in 1998 [1]. One of the most interesting features of this CI is that it makes a crucial assumption about and . This interval formally adds two successes and two failures to the obtained values in case of 95% confidence level and then uses the CLT method. The interval can be constructed as follows:

(9)

Additionally, this interval can be modified by using the center of the Wilson interval (8) in place of :

(10)

Logit interval.

The Logit interval is based on a transformation of the standard interval

[8]. It uses the empirical logit transformation: . The variance of is: and the Logit interval can be estimated as:

(11)

where the lower bound transformation is and the upper bound transformation is .

Anscombe interval.

This interval was proposed by Anscombe in 1956 [4] and is based on the Logit interval (11). The key difference is in and estimation, where is defined as and the variance is On this basis, the Anscombe interval is estimated in the same way as the Logit interval (11).

Arcsine interval.

It uses a variance-stabilising transformation of . In 1948, Anscombe introduced an improvement [3] for achieving better variance stabilisation by replacing to , obtaining

(12)

4.2 Alternative Intervals Based on the Beta-Function

Bayesian interval.

This method is based on the assumption that the (unknown) probability to estimate is a random quantity [24]

. The Bayesian interval is also called credible, because it computes the posterior distribution of the unknown quantity by using its prior distribution and Bayes theorem. The prior distribution can be constructed by means of the

distribution, which is widely used for computing inferences on . If has a prior distribution then has posterior distribution . We can construct a Bayesian equal-tailed interval by the formula:

(13)

where, is the inverse of the cumulative distribution function of .

Jeffreys interval.

The Jeffreys interval is a Bayesian interval and uses the Jeffreys prior [16], which involves a non-informative prior given by the distribution with parameters

. The probability density function of the

distribution is where , and is beta function. We can form the Jeffreys equal-tailed interval by (13) with parameters .

Clopper-Pearson interval.

This method was introduced by Clopper and Pearson in 1934 [7] and is based on the inversion of binomial test, rather than on approximations. The Clopper-Pearson interval is:

(14)

The interval states that the computed coverage probability is always above or equal to the confidence level. In practice, it can be achieved in cases when is large enough, while in general, the actual coverage can exceed . We can conclude from equation (14) that due to the absence of the and parameters, the appropriate result can be achieved only by increasing number of trials.

5 Results

We apply CI estimation methods, based on the standard interval with the RQMC technique and Bayesian CI estimation method with the MC technique. All our results are the average of 10 runs: in the RQMC case 10 sequences were obtained by changing the pseudo-random points of the equation , while the Sobol sequence points remain the same; in the MC case we used the same 10 pseudo-random points sequences, which were used for RQMC calculation.

5.1 Border Probability Cases

The true probability values, which are shown in Section 5.1 and Section 5.2

were obtained via pseudo-random number generation that produces boolean values according to a Bernoulli distribution.

Intervals based on CLT and Bayesian interval.

The comparison of the different CIs estimation techniques for extreme probability cases (near 0 bound) with accuracy , which is presented in Figure 2, shows that all intervals except the Arcsin interval (12) (see plot of Figure 2 for probability=0.001) contain the true probability value within their bounds. The Bayesian method tends to overestimate the true probability values according to their increase while tends to underestimate them. Also, it is interesting to note that the most accurate center value is returned by the Agresti-Coull interval. The reason why tends to include the true probability value near the upper bound of the interval is directly related to the number of samples. As it is shown in Figure 2 for true probability values 0.007 - 0.01, the center is moving up evenly to the true probability value with the increase of the confidence value. It echoes the number of samples growth for obtaining the necessary confidence level. For the other true probability values (0.001-0.006), although this trend retained, it can not be seen from the Figure, because of the small difference in the number of samples for all confidence levels, which causes the CI center to move wave-like.

The results in Figure 2 also demonstrate that the CIs based on the standard interval can have interval size smaller than its nominal value even for large sample sizes. It can be seen that every confidence level from 0.99 to 0.99999 displays further instances of the inadequacy of the CIs size. Also, Figure 2 shows that in spite of the large sample size (according to the confidence level), the size of the , and intervals decreases significantly as

moves toward 0. Also, CIs based on the standard interval have interval size changes because of two reasons: absence of a posteriori estimate and skewness of the underlying binomial distribution. The erratic and unsatisfactory coverage properties of the standard interval have often been remarked on, but still do not seem to be widely appreciated among statisticians

[6]. The sequence sampling version still has the same disadvantages.

In Figure 3 we plot the number of samples that different CI estimation techniques used to return intervals with accuracy for different confidence levels. It can be clearly seen from the plots that with the increasing of the confidence all CIs based on the standard interval outperform the Bayesian CI. The plot with in Figure 3 displays that the best techniques in the number of samples from the best to the worst are: , , , , , , and . The and techniques always show almost the same results near the bounds, because of the modification of the . Initially, is not able to deal with probability values near the bounds according to its formula (see Section 4.1). It has been modified to use the Anscombe estimation formula in cases when or . It is also important to note that the difference in sample number between , and for extreme probability cases is significant. For example in the plot with of Figure 3 the number of samples used to obtain interval for equals to 1,078 for , 2,662 for the and 4,440 for .

This trend is not preserved with the increase of the probability value from 0 to 0.5 and with the decrease from 1 to 0.5, respectively. Figure 4 shows that the difference in sample number between all CIs (except ) is almost undetectable. At the same time, shows very bad results in comparison with the others, as opposed to its results for probability values at the extremes.

Summarizing, for probability values near the bounds (0 or 1) the modified CLT method achieves better results in number of samples in comparison with the others (see Figure 3). For probability values away from the bounds, CLT, Wilson, Agresti–Coull, Logit and Anscombe methods are all very similar (see Figure 4), and so for such probabilities we come to the conclusion that the CLT interval should be recommended, due to its simplest form. Meanwhile for smaller sample sizes, the is strongly preferable to the others and so might be the choice where sampling cost is paramount.

Qint method results.

In Figure 2 and Figure 3 we also plotted the results of the recently developed Qint algorithm [5]. In our research we used Qint with , where . These parameters were used to form points of the Sobol sequence with numbers . These parameters were chosen on the basis of the original study of the Qint method as the most universal and reliable. As it was described earlier, Qint uses a cubature randomization method and provides the integral estimation variance (3). This formula is used to obtain a CI by calculating the standard interval (7) with our modification.

In Figure 2 we display the Qint intervals distribution for border probability values. We can see from the plots that the Qint CI always contains the true probability value. At the same time for all confidence levels from 0.99 to 0.99999 and for true probability values 0.006-0.01, Qint shows better centration than and . The greatest differences between the Qint CLT center result and the true probability values are: 0.00245 for (p=0.004), 0.00191 for (p=0.004), 0.00168 for (p=0.003), 0.00141 for (p=0.004), while for example this difference for reaches 0.00518 for (p=0.007), 0.00235 for (p=0.008), 0.00181 for (p=0.008), 0.00143 for (p=0.008).

Figure 2: Comparison of confidence interval distribution for probability values near 0, interval size equal to and c - confidence level.
Figure 3: Comparison of sample size for probability values near 0, interval size equal to and c - confidence level.
Figure 4: Comparison of sample size for probability values from 0 to 1, interval size equal to and confidence level equal to 0.99999.

We can see in Figure 3 that, as it was expected, Qint uses fewer samples than other CIs but . Our modification allows the Qint algorithm to return intervals even if , which significantly decreases the final sample size for 10 runs. With the increase in , which leads to further sampling and better reflects the behavior of the underlying random process, the effectiveness of the method decreases, and the benefit no longer seem so significant.

The fact that with the chosen parameters Qint can not outperform our modified leads us to the conclusion that our use of the standard deviation formula with lower bound is a rather effective and simple solution. However, the deep range of the possible parameters variation as well as novelty of the Qint algorithm lead us to believe that further research towards their comparison is needed.

5.2 MC and QMC Error Comparison

Another key difference between the Bayesian CI and the CIs based on CLT is the use of MC and QMC techniques for interval calculation. As it was described in Section 3.2, the QMC advantage in the error size holds for all of the tested models (see Figures 5-16). In the cases where the true error rate could not be detected due to the probability value extremely close to 0 (Bad model type min and Collision (Basic) model type min), we have that the MC absolute error line equals the true probability value, because was obtained. The chaotic coverage properties of the MC method are far more persistent than they are appreciated. The chaotic behavior does not disappear even when is quite large and the true probability is not near the boundaries. For instance, in Figure 6 it is visible that even when is quite large (i.e., tends to 10,000 samples) the actual absolute error value of the MC method reaches . Hence we can conclude that CIs estimation techniques based on MC are misleading and defective in several respects and should not be trusted [6].

A notable phenomenon, which was noticed for the MC and QMC probability calculation is that the actual coverage probability contains non-negligible oscillations as both and vary. There exist some “unlucky” pairs such that the corresponding absolute error is much greater than the results for smaller . The phenomenon of oscillation is both in , for fixed , and in , for fixed . Furthermore, drastic changes in coverage occur in nearby for fixed and in nearby for fixed [6]. We can see it on the simple example in Figure 6.

-1.3in-.3in


Confidence level c=0.99
Model Type P Qint
Good max 0.1 0.09671, 0.10671 0.09564, 0.10564 0.09632, 0.10632 0.09574, 0.10574 0.09575, 0.10575 0.09577, 0.10577 0.09559, 0.10559 0.09147, 0.10147
min 0.1 0.09529, 0.10529 0.0956, 0.1056 0.09666, 0.10666 0.09679, 0.10679 0.09678, 0.10678 0.0968, 0.1068 0.09639, 0.10639 0.09164, 0.10164
Bad max 0.95001 0.94416, 0.95416 0.94495, 0.95493 0.94422, 0.95422 0.94397, 0.95397 0.94396, 0.95396 0.94392, 0.95392 0.94735, 0.95735 0.94459, 0.95459
max2 0.88747 0.8825, 0.8925 0.88028, 0.89028 0.88031, 0.88031 0.88019, 0.89019 0.8803, 0.8902 0.88019, 0.89019 0.88325, 0.89325 0.88136, 0.89136
min 0, 0.00525 0, 0.005 0, 0.00483 0, 0.00955 0.00005, 0.00959 0.00005, 0.00959 0.00131, 0.00959 [0,0.005]
Deceleration max [0.08404, 0.08881] 0.08471, 0.09471 0.08802, 0.09802 0.08817, 0.09817 0.08685, 0.09685 0.08614, 0.09614 0.0863, 0.0963 0.08963, 0.09932 0.08852, 0.09852
min [0.04085, 0.04275] 0.03835, 0.04835 0.03861, 0.04861 0.03854, 0.04854 0.03884, 0.04884 0.03886, 0.04886 0.0389, 0.0489 0.03873, 0.04873 0.03337, 0.04337
Collision (Basic) max [0.96567, 0.97254] 0.96371, 0.97381 0.96873, 0.97873 0.9684, 0.9784 0.96851, 0.97851 0.96875, 0.97875 0.96853, 0.97853 0.96851, 0.97851 0.96301, 0.97301
min [0, 0.00201] 0, 0.00525 0, 0.005 0, 0.00483 0, 0.00955 0.00005, 0.00959 0.00005, 0.00959 0.00131, 0.00959 [0,0.005]
Collision (Extended) max [0.35751, 0.49961] 0.42267, 0.43675 0.42418, 0.4342 0.42187, 0.43187 0.42345, 0.43345 0.42463, 0.43463 0.42457, 0.43457 0.42385, 0.43385 0.42342, 0.43342
min [0.04296, 0.06311] 0.0482, 0.0582 0.04772, 0.05772 0.04785, 0.05785 0.04823, 0.05823 0.04812, 0.05812 0.0481, 0.0581 0.04757, 0.05772 0.04618, 0.05618
Collision (Advanced) max [0.14807, 0.31121] 0.2072, 0.2172 0.20873, 0.21872 0.21872, 0.2185 0.20854, 0.21854 0.20854, 0.21854 0.20855, 0.21855 0.20111, 0.21111 0.20167, 0.21166
min [0.02471, 0.05191] 0.02631, 0.03631 0.03045, 0.04045 0.03016, 0.04016 0.03001, 0.04 0.03001, 0.04 0.03016, 0.04016 0.03164, 0.04164 0.0304, 0.0404
Anesthesia n/a [0.00916, 0.04222] 0.01361, 0.02361 0.01339, 0.02332 0.01374, 0.02374 0.01373, 0.02373 0.01318, 0.02318 0.01311, 0.02311 0.01592, 0.02592 0.01815, 0.02815

Confidence level c=0.99999
Model Type P Qint

Good
max 0.1 0.09499, 0.10499 0.09378, 0.10378 0.09386, 0.10386 0.09389, 0.10389 0.09391, 0.10391 0.09392, 0.10392 0.09405, 0.10405 0.09512, 0.10512
min 0.1 0.09419, 0.10419 0.09667, 0.10667 0.09668, 0.10668 0.09677, 0.10677 0.09671, 0.10671 0.09679, 0.10679 0.09675, 0.10675 0.09525, 0.10525
Bad max 0.95001 0.94525, 0.95525 0.94579, 0.95579 0.94564, 0.95564 0.94548, 0.95548 0.94545, 0.95545 0.94543, 0.95543 0.94735, 0.95735 0.94543, 0.95543
max2 0.88747 0.88215, 0.89215 0.88055, 0.89055 0.88057, 0.89057 0.88046, 0.89046 0.88046, 0.89046 0.88046, 0.89046 0.88325, 0.89325 0.88052, 0.89052
min 0, 0.00517 0, 0.00319 0, 0.00494 0, 0.00984 0, 0.00992 0, 0.00992 0.00445, 0.0139 [0,0.005]
Deceleration max [0.08404, 0.08881] 0.08613, 0.09613 0.08624, 0.09624 0.08312, 0.09312 0.08725, 0.09725 0.08725, 0.09725 0.08726, 0.09726 0.08746, 0.09746 0.08737, 0.09735
min [0.04085, 0.04275] 0.03514, 0.04514 0.03919, 0.04919 0.03918, 0.04918 0.03942, 0.04942 0.03943, 0.04943 0.03944, 0.04944 0.039, 0.049 0.03377, 0.04377
Collision (Basic) max [0.96567, 0.97254] 0.96359, 0.97359 0.96241, 0.97241 0.96767, 0.9767 0.96892, 0.96892 0.96689, 0.97589 0.96683, 0.97583 0.96863, 0.97863 0.96462, 0.97462
min [0 , 0.00201] 0, 0.00517 0, 0.00319 0, 0.00494 0, 0.00984 0, 0.00992 0, 0.00992 0.00445, 0.0139 [0,0.005]
Collision (Extended) max [0.35751, 0.49961] 0.42651, 0.43652 0.42719, 0.43724 0.42757, 0.43757 0.42656, 0.43656 0.41774, 0.42774 0.41779, 0.42779 0.42745, 0.43745 0.42875, 0.43875
min [0.04296, 0.06311] 0.04979, 0.05979 0.04766, 0.05766 0.04764, 0.05764 0.04748, 0.05748 0.04745, 0.05745 0.04776, 0.05776 0.05776, 0.05673 0.04576, 0.05576
Collision (Advanced) max [0.14807, 0.31121] 0.20515, 0.21519 0.20558, 0.21563 0.20533, 0.21533 0.20531, 0.21531 0.20547, 0.21547 0.20547 0.21547 0.20385, 0.21385 0.20453, 0.21453
min [0.02471, 0.05191] 0.03011, 0.04015 0.02902, 0.03902 0.02954, 0.03945 0.03956, 0.04956 0.03861, 0.04861 0.03887, 0.04887 0.0363, 0.04363 0.03031, 0.04031
Anesthesia n/a [0.00916, 0.04222] 0.01284, 0.02284 0.01513, 0.02511 0.01623, 0.02623 0.01545, 0.02545 0.01557, 0.02557 0.01562, 0.02562 0.01385, 0.02385 0.01852, 0.02852

Table 1: Results for CI computation obtained via ProbReach, with solver precision = and interval size equal to , Type - extremum type and P - true probability value, where single point values were analytically computed and interval values are numerically guaranteed.

-.2in-.1in

Model Type c Qint
Good max 0.99 24252 24025 24038 24027 24035 24034 26681 23136
min 0,99 23451 23248 24256 24250 24253 24252 26894 23245
Bad max 0.99 13118 12670 12841 12817 12833 12832 23006 11726
max2 0.99 27498 26954 26960 26955 26958 26958 40442 25734
min 0.99 2590 96 961 688 680 680 347 94
Deceleration max 0.99 22842 22393 22673 22517 22628 22623 24365 20318
min 0.99 11224 11073 11114 11086 11104 11104 11570 9798
Collision (Basic) max 0.99 9581 9318 9653 9463 9386 9381 10643 8222
min 0.99 2590 96 961 688 680 680 347 94
Collision (Extended) max 0.99 65109 64804 64854 64841 64932 64930 104637 62485
min 0.99 13624 13257 13486 13375 13326 13320 14737 12869
Collision (Advanced) max 0.99 44370 43602 43645 43640 43644 43643 51734 43524
min 0.99 9500 9081 9094 9085 9090 9089 9282 9080
Anesthesia n/a 0.99 5801 4847 5024 4952 4928 4919 5522 4804

Good
max 0.99999 70422 69484 69582 69496 69530 69529 77262 68456
min 0,99999 71898 71286 71339 71293 71321 71321 79369 68994
Bad max 0.99999 37388 36518 36771 36629 36687 36868 60006 36164
max2 0.99999 79306 79097 79125 79101 79118 79118 96442 77892
min 0.99999 5797 124 2766 1963 4136 4136 572 94
Deceleration max 0.99999 65248 65233 65330 65299 65320 65319 72114 59882
min 0.99999 33147 32969 33133 33018 33060 33060 34231 29096
Collision (Basic) max 0.99999 25279 24711 24834 24789 24934 24933 26045 23016
min 0.99999 5797 124 2766 1963 4136 4136 572 94
Collision (Extended) max 0.99999 191466 190776 191253 190894 191485 191472 376294 185456
min 0.99999 41153 38942 39745 39473 39537 39541 47923 37608
Collision (Advanced) max 0.99999 131517 129746 131185 129845 129934 129933 183405 127486
min 0.99999 27305 25657 25835 25736 25792 25791 29362 24569
Anesthesia n/a 0.99999 16197 15453 15834 15634 15734 15733 17845 15314

Table 2: Samples size comparison for confidence interval computation obtained via ProbReach, with solver precision equal to and interval size equal to , Type - extremum type and c - confidence level.

-.6in-.6in

Figure 5: MC (grey line) and QMC (black line) absolute error with respect to the number of samples. Model: Collision advanced, type - max.
Figure 6: MC (grey line) and QMC (black line) absolute error with respect to the number of samples. Model: Bad,                                                                                                                       type - max2.

5.3 Tested Models Results

Intervals based on CLT and Bayesian interval.

Based on our model set, we provide in Table 1 a comparison of the CIs described in the Section 4, obtained via ProbReach with precision , interval size and true probability value Pis either analytically computed single probability values or formally computed absolute (non-statistical) intervals. These parameters were chosen according to previous work [19] as a fine trade-off between the precision of the results and the CPU time. Each model was verified separately with different confidence level from 0.99 to 0.99999 (see also Table 3). The lowest confidence level is often used in similar work, while the highest can provide reasonable results for real-world complex models.

As it can be seen in Table 1, all the intervals for the various techniques overlap. The modified approach shows very similar results to the , which can be regarded as a successful implementation. The key difference in the interval sizes can be found in the results of the Bad model Type min and the Collision (Basic) model Type min. From the results we can conclude that the true probability value is very close to 0. This allows the Bayesian, CLT and Agresti-Coull methods to form intervals, which in reality are half of the proposed interval size , while the other techniques return fully sized intervals. That happens because is using posterior distribution to form the interval. At the same time, the and calculations of the mean value return the result, which is quite close to zero. Thus, the next step of the interval bounds computation cuts the negative part of the interval. This trend is holding for all probability values within .

Table 1 also shows that with the increase of the confidence level the interval’s precision is growing, which in turn is directly related to the usage of the inverse cumulative distribution function for normal random variable with given confidence level in formulas for (7), (8), (10) and (12). It also results in the increase of the sample size for and .

The comparison of the obtained intervals (see Table 1) with the true probability value or interval P shows that all CIs contain the single probability values, but (see Bad type min model of Table 1), and all CIs overlap with the true probability intervals. We can also note that the true probability intervals of the Collision Extended, Collision Advanced, and Anesthesia models contain all confidence intervals for all confidence levels (see also Table 3). The reasons why Collision Basic and Deceleration models’ true probability intervals do not contain CIs are their size, which is very small () and their closeness to 0.

Table 2 provides very interesting results with respect to the number of samples, which were used to find CIs obtained via ProbReach with solver precision and interval size . The number of samples varies for different models and types (see also Table 4). As it was noted earlier for Figure 4, the number of samples needed for the computation grows from the bounds to the center of the [0,1] interval. The presented models show different behaviour and probability results. The most important outcome is that all CIs (except ) show better result in number of points with respect to . The best result was shown by . It shows that the proposed CLT modification can provide reasonable results for RQMC calculation in comparison with the well-established Bayesian MC integral calculation.

Qint method results.

A comparison of the Qint method’s confidence intervals is presented in Table 1. All of the Qint intervals also contain single probability values and overlap with true probability intervals. The original Qint algorithm is not able to provide results for Bad type min and Collision Basic type min models, because for very small probability values like and [0, 0.00201] it could not detect for the chosen confidence levels and interval size. Due to this reason the original Qint algorithm was changed by modifying the CLT method described in Section 4.1. From the results we see that that the Qint algorithm shows great potential, which is connected with the very fast convergence rate of the QMC method and with finding an appropriate partition (in terms of the parameters ).

Table 2 allows us to compare Qint’s sample sizes with those of other CIs. However, had an advantage in the number of samples for small probability values near the border (see Figure