1 Introduction
Reachability is one of the fundamental problems in verification and model checking. Given a system model and a set of goal states (indicating (un)wanted behaviour), does the system eventually reach these states? The generalisation of this problem for stochastic systems is called probabilistic reachability, and it amounts to compute the probability that the system reaches a goal state.
Checking reachability in hybrid systems is an undecidable problem for all but the simplest systems (timed automata) [2]. Formal verification of hybrid systems can include verifying satisfiability of formulas involving real variables, which is known to be an undecidable problem when, e.g., trigonometric functions are involved [22]. In order to combat the undecidability of general sentences over the reals, Gao, Avigad and Clarke defined the notion of complete decision procedure [12]. This approach has been extended to a bounded probabilistic reachability method with statistically valid enclosures for the probability that a hybrid system can reach a goal state within a given time bound and number of steps [20]. In particular, we consider the step reachability probability for parametric hybrid systems with random parameters. The ProbReach tool [19] implements such a method and computes under and over approximation of the reachability probability, which amounts to computing multidimensional integrals. There are three possible ways to compute such integrals  formal, MonteCarlo (MC) and QuasiMonte Carlo (QMC). The number of system evolutions to explore in order to accurately compute integrals grows exponentially with respect to the number of dimensions [22]. This motivates a combination of MC and QMC methods and numerical decision procedures in order to define efficient, numerically accurate estimation techniques.
It is wellknown that MC methods are based on the Law of Large Numbers and random sampling. Instead, QMC methods are based on
deterministic sampling from socalled quasirandom sequences. The error estimation of the QMC method can be computed theoretically by the KoksmaHlawka inequality. Unfortunately, in practice its use is connected with a number of calculation difficulties [15]. The terms of quasirandom sequences are statistically dependent, so the Central Limit Theorem (CLT) can not be used for estimating the integration error. At the same time we can successfully use the CLT for estimating the error of Randomised QuasiMonte Carlo (RQMC) methods.The aim of this paper is to compare different interval estimation techniques, in particular in the extreme cases of probability close to 0 or 1. A problem of many confidence interval (CI) techniques is that the actual coverage probability of the interval near the boundaries (0 and 1) can be poor [18, 6].
The paper is structured as follows: In Section 2, we briefly introduce probabilistic reachability for stochastic parametric hybrid systems. In Section 3, we present integral estimation methods including MC, QMC and RQMC. Additionally, we consider a recent approach to QMC variance calculation for statistical error estimation. In Section 4, we examine CI error estimation with approaches based on the standard CLT interval and on the
function. In Section 5, we empirically compare those CI estimation techniques on five benchmarks, and derive usage guidelines. In Section 6, we provide conclusions and suggest future work in the area.2 Probabilistic Reachability
Hybrid systems provide a framework for modelling realworld systems that combine continuous and discrete dynamics [2]. In particular parametric hybrid systems (PHSs) [20] represent continuous and discrete dynamic behavior dependent on initial parameters, which remain unchanged during the system evolution. Such systems can both flow, described by a differential equation and jump, described by difference equations or control graphs.
In order to combat the undecidability of reasoning over real sentences Gao et al. [12] defined complete decision procedures, which correctly decide whether a slightly relaxed sentence is satisfiable or not. Let be a constant and a bounded sentence in the standard form: , where the are compositions of Type 2computable functions (these are essentially numerically computable real functions, including transcendental functions and solutions of differential equations). The weakening of is the formula: . The bounded SMT problem asks for the following: given a sentence of the above form and , correctly decide one of the following:

unsat: is false,

true: is true.
If the two cases overlap either decision can be returned. Standard bounded reachability questions over PHSs can be coded as sentences and decided by complete decision procedures [14].
In this paper we are concerned with stochastic PHSs, which introduce random parameters to an otherwise deterministic PHS. Bounded step reachability in PHSs aims at finding the probability that for the given initial conditions, the system reaches a goal state in discrete transitions within a given finite time. It can be shown that this probability can be computed as an integral of the form , where denotes the set of all random parameter values for which the system reaches a goal state in steps, and is the probability measure associated with the random parameters [20].
3 Integral Estimation Methods
3.1 Monte Carlo Method
Consider the integral
, and a random variable
on . The expectation of is:where is the density of . If
is uniformly distributed on
, then the integral becomes:If we take points, uniformly distributed on , and compute the sample mean , we obtain the MC integral estimation:
(1) 
According to the Strong Law of Large Numbers, this approximation is convergent (for ) to with probability one. The variance of the MC estimator (1) is:
(2) 
The MC integration error mean is , where is the integrand variance, which is assumed to exist. In practice, the integrand variance is often unknown. That is why the next estimation is instead used:
This estimator possesses the unbiasedness property: .
3.2 QuasiMonte Carlo Method
QMC methods can be regarded as a deterministic counterpart to classical MC methods. Unlike MC integration, which uses estimates (1) with randomly selected points, QMC methods select the points deterministically. Specifically, QMC techniques produce deterministic sequences of points that provide the bestpossible spread over the integration domain. These deterministic sequences are often referred to as lowdiscrepancy sequences. The Sobol sequence [21] is a wellknown example of lowdiscrepancy sequence. In Figure 1, we present a simple example of the comparison between Sobol and pseudorandom points distribution. An effective way to use the QMC method is by performing a change of variables to reduce integration to the domain. When we need to integrate over a large domain , that avoids multiplying the QMC estimate by a large factor as required by (1).
A QMC advantage with respect to MC is that its error is , while the MC error is , where is the sample size. The KoksmaHlawka inequality bounds the error of QMC estimates, but in practical applications it is very hard to estimate [15], thereby hampering the use of QMC methods. As such, other methods for estimating the QMC error need to be developed.
Qint Cubature Method.
Ermakov and Antonov [5] have recently introduced a new method for QMC variance estimation. To construct an estimate of the integral they use the set of random quadrature formulas, which were introduced by the ErmakovGranovsky theorem [11]. This theorem allows us to construct point formulas with two important properties: the unbiasedness property for integral and the accuracy property for the considered Haar system. The nodes of the formula are random variables with distribution density:
where is a Latin set that relates to the permutation and can be defined by the next condition:
where is a set of permuted orthonormal Haar functions [5].
The variance of the constructed cubature formula can be calculated as:
where is the variance of MC method (2) and for .
In other words, we can redefine the integral estimation variance as:
(3) 
3.3 Randomised QuasiMonte Carlo
As discussed earlier, the practical application of QMC is limited by the difficulty of computing an estimate of the integration error. However, allowing randomization into the deterministic QMC procedure enables constructing CIs. A Randomised QMC (RQMC) procedure can be described as follows. Suppose that is a deterministic lowdiscrepancy set. By means of a transformation a finite set is generated by the random variable and has the same quasirandom properties as set (see Figure 1). For a randomised set we construct a RQMC estimate similar to (1):
(4) 
for , where is the total number of different pseudorandom sequences. Then, we take their average for overall RQMC estimation (4):
(5) 
If we choose the transformation in such a way that each of the estimates has the unbiasedness property, i.e., , (e.g. mod ), then the estimator (5) will also be unbiased, i.e., . By independence of the samples used in (4) and (5), we have that for all :
Thus, we have the following variance estimation:
3.4 Validated MC and QMC
Let us consider a hybrid system with random parameters only. For any parameter value from the initial parameters distribution we introduce the Bernoulli random variable , which takes 1 if system reaches the goal in steps for and 0 otherwise. Since in general we can not sample because of undecidability, we instead consider two Bernoulli random variables: , which takes 1 if we can correctly decide that system reaches the goal in steps for and 0 otherwise; , which takes 0 if we can correctly decide that system does not reach the goal and 1 otherwise [20]. Therefore:
and thus:
By the definition of expectation, and denoting as the domain of the random parameters of , we get:
(6) 
We take the sample approximation of (6) and obtain
where the ’s can be sampled by using lowdiscrepancy sequences for QMC methods or pseudorandom sequences for MC methods.
4 Confidence Interval Estimation and Error Analysis
In the following we shall use the notation below:
 the sample mean.
 the inverse cumulative distribution function of a normal random variable with
lmean 0 and standard deviation 1; parameter
defines the confidence level at . the binomiallydistributed proportion, where:
 the number of successes and  the number of failures in Bernoulli trial process;  the total number of Bernoulli trials..
4.1 Intervals Based on the Standard CLT Interval
Modified Central Limit Theorem interval.
First, we consider the case when the sample
is extracted from the normal distribution
with unknown parameter and known , where is the mean or expectation of the distribution and is the variance. Here, can be approximated by the sample mean: . To clarify this approximation, we construct a CI covering the parameter with a given confidence probability:(7) 
If the variance is unknown, we can use the same CI by replacing with the sample standard deviation . This method is widely used for estimating the distribution of the error regarding the binomiallydistributed proportions. Many related works [6, 7, 8] note that the approximation can be poor when applied to Bernoulli trials with close to 0 or 1.
In order to resolve this problem, we introduce a new method for variance estimation, which uses a sequential estimation of the sample standard deviation and calculates (7) at every new sample. Our solution simply approximates the sample standard deviation with at the initial stages of the computation if is equal to 0 (or 1) and propagates it through the computation until the necessary number of samples to construct the interval are obtained. We show the advantages of this approach in Section 5.
Wilson interval.
It was introduced by Wilson in 1927 in his fundamental work [9]
and uses the inversion of the CLT interval. Additionally, it involves a modified center by quantile formula mean value. The interval has the following form:
(8) 
This interval has some obvious advantages  it can not exceed the probability boundaries, and it can be easily calculated even if is 0 or 1. At the same time, has downward spikes when is close to 0 and 1, because it is formed by an inverted CLT approximation.
AgrestiCoull interval.
This method was introduced by Agresti and Coull in 1998 [1]. One of the most interesting features of this CI is that it makes a crucial assumption about and . This interval formally adds two successes and two failures to the obtained values in case of 95% confidence level and then uses the CLT method. The interval can be constructed as follows:
(9) 
Additionally, this interval can be modified by using the center of the Wilson interval (8) in place of :
(10) 
Logit interval.
Anscombe interval.
Arcsine interval.
It uses a variancestabilising transformation of . In 1948, Anscombe introduced an improvement [3] for achieving better variance stabilisation by replacing to , obtaining
(12) 
4.2 Alternative Intervals Based on the BetaFunction
Bayesian interval.
This method is based on the assumption that the (unknown) probability to estimate is a random quantity [24]
. The Bayesian interval is also called credible, because it computes the posterior distribution of the unknown quantity by using its prior distribution and Bayes theorem. The prior distribution can be constructed by means of the
distribution, which is widely used for computing inferences on . If has a prior distribution then has posterior distribution . We can construct a Bayesian equaltailed interval by the formula:(13) 
where, is the inverse of the cumulative distribution function of .
Jeffreys interval.
The Jeffreys interval is a Bayesian interval and uses the Jeffreys prior [16], which involves a noninformative prior given by the distribution with parameters
. The probability density function of the
distribution is where , and is beta function. We can form the Jeffreys equaltailed interval by (13) with parameters .ClopperPearson interval.
This method was introduced by Clopper and Pearson in 1934 [7] and is based on the inversion of binomial test, rather than on approximations. The ClopperPearson interval is:
(14) 
The interval states that the computed coverage probability is always above or equal to the confidence level. In practice, it can be achieved in cases when is large enough, while in general, the actual coverage can exceed . We can conclude from equation (14) that due to the absence of the and parameters, the appropriate result can be achieved only by increasing number of trials.
5 Results
We apply CI estimation methods, based on the standard interval with the RQMC technique and Bayesian CI estimation method with the MC technique. All our results are the average of 10 runs: in the RQMC case 10 sequences were obtained by changing the pseudorandom points of the equation , while the Sobol sequence points remain the same; in the MC case we used the same 10 pseudorandom points sequences, which were used for RQMC calculation.
5.1 Border Probability Cases
The true probability values, which are shown in Section 5.1 and Section 5.2
were obtained via pseudorandom number generation that produces boolean values according to a Bernoulli distribution.
Intervals based on CLT and Bayesian interval.
The comparison of the different CIs estimation techniques for extreme probability cases (near 0 bound) with accuracy , which is presented in Figure 2, shows that all intervals except the Arcsin interval (12) (see plot of Figure 2 for probability=0.001) contain the true probability value within their bounds. The Bayesian method tends to overestimate the true probability values according to their increase while tends to underestimate them. Also, it is interesting to note that the most accurate center value is returned by the AgrestiCoull interval. The reason why tends to include the true probability value near the upper bound of the interval is directly related to the number of samples. As it is shown in Figure 2 for true probability values 0.007  0.01, the center is moving up evenly to the true probability value with the increase of the confidence value. It echoes the number of samples growth for obtaining the necessary confidence level. For the other true probability values (0.0010.006), although this trend retained, it can not be seen from the Figure, because of the small difference in the number of samples for all confidence levels, which causes the CI center to move wavelike.
The results in Figure 2 also demonstrate that the CIs based on the standard interval can have interval size smaller than its nominal value even for large sample sizes. It can be seen that every confidence level from 0.99 to 0.99999 displays further instances of the inadequacy of the CIs size. Also, Figure 2 shows that in spite of the large sample size (according to the confidence level), the size of the , and intervals decreases significantly as
moves toward 0. Also, CIs based on the standard interval have interval size changes because of two reasons: absence of a posteriori estimate and skewness of the underlying binomial distribution. The erratic and unsatisfactory coverage properties of the standard interval have often been remarked on, but still do not seem to be widely appreciated among statisticians
[6]. The sequence sampling version still has the same disadvantages.In Figure 3 we plot the number of samples that different CI estimation techniques used to return intervals with accuracy for different confidence levels. It can be clearly seen from the plots that with the increasing of the confidence all CIs based on the standard interval outperform the Bayesian CI. The plot with in Figure 3 displays that the best techniques in the number of samples from the best to the worst are: , , , , , , and . The and techniques always show almost the same results near the bounds, because of the modification of the . Initially, is not able to deal with probability values near the bounds according to its formula (see Section 4.1). It has been modified to use the Anscombe estimation formula in cases when or . It is also important to note that the difference in sample number between , and for extreme probability cases is significant. For example in the plot with of Figure 3 the number of samples used to obtain interval for equals to 1,078 for , 2,662 for the and 4,440 for .
This trend is not preserved with the increase of the probability value from 0 to 0.5 and with the decrease from 1 to 0.5, respectively. Figure 4 shows that the difference in sample number between all CIs (except ) is almost undetectable. At the same time, shows very bad results in comparison with the others, as opposed to its results for probability values at the extremes.
Summarizing, for probability values near the bounds (0 or 1) the modified CLT method achieves better results in number of samples in comparison with the others (see Figure 3). For probability values away from the bounds, CLT, Wilson, Agresti–Coull, Logit and Anscombe methods are all very similar (see Figure 4), and so for such probabilities we come to the conclusion that the CLT interval should be recommended, due to its simplest form. Meanwhile for smaller sample sizes, the is strongly preferable to the others and so might be the choice where sampling cost is paramount.
Qint method results.
In Figure 2 and Figure 3 we also plotted the results of the recently developed Qint algorithm [5]. In our research we used Qint with , where . These parameters were used to form points of the Sobol sequence with numbers . These parameters were chosen on the basis of the original study of the Qint method as the most universal and reliable. As it was described earlier, Qint uses a cubature randomization method and provides the integral estimation variance (3). This formula is used to obtain a CI by calculating the standard interval (7) with our modification.
In Figure 2 we display the Qint intervals distribution for border probability values. We can see from the plots that the Qint CI always contains the true probability value. At the same time for all confidence levels from 0.99 to 0.99999 and for true probability values 0.0060.01, Qint shows better centration than and . The greatest differences between the Qint CLT center result and the true probability values are: 0.00245 for (p=0.004), 0.00191 for (p=0.004), 0.00168 for (p=0.003), 0.00141 for (p=0.004), while for example this difference for reaches 0.00518 for (p=0.007), 0.00235 for (p=0.008), 0.00181 for (p=0.008), 0.00143 for (p=0.008).
We can see in Figure 3 that, as it was expected, Qint uses fewer samples than other CIs but . Our modification allows the Qint algorithm to return intervals even if , which significantly decreases the final sample size for 10 runs. With the increase in , which leads to further sampling and better reflects the behavior of the underlying random process, the effectiveness of the method decreases, and the benefit no longer seem so significant.
The fact that with the chosen parameters Qint can not outperform our modified leads us to the conclusion that our use of the standard deviation formula with lower bound is a rather effective and simple solution. However, the deep range of the possible parameters variation as well as novelty of the Qint algorithm lead us to believe that further research towards their comparison is needed.
5.2 MC and QMC Error Comparison
Another key difference between the Bayesian CI and the CIs based on CLT is the use of MC and QMC techniques for interval calculation. As it was described in Section 3.2, the QMC advantage in the error size holds for all of the tested models (see Figures 516). In the cases where the true error rate could not be detected due to the probability value extremely close to 0 (Bad model type min and Collision (Basic) model type min), we have that the MC absolute error line equals the true probability value, because was obtained. The chaotic coverage properties of the MC method are far more persistent than they are appreciated. The chaotic behavior does not disappear even when is quite large and the true probability is not near the boundaries. For instance, in Figure 6 it is visible that even when is quite large (i.e., tends to 10,000 samples) the actual absolute error value of the MC method reaches . Hence we can conclude that CIs estimation techniques based on MC are misleading and defective in several respects and should not be trusted [6].
A notable phenomenon, which was noticed for the MC and QMC probability calculation is that the actual coverage probability contains nonnegligible oscillations as both and vary. There exist some “unlucky” pairs such that the corresponding absolute error is much greater than the results for smaller . The phenomenon of oscillation is both in , for fixed , and in , for fixed . Furthermore, drastic changes in coverage occur in nearby for fixed and in nearby for fixed [6]. We can see it on the simple example in Figure 6.
1.3in.3in
Confidence level c=0.99 

Model  Type  P  Qint  
Good  max  0.1  0.09671, 0.10671  0.09564, 0.10564  0.09632, 0.10632  0.09574, 0.10574  0.09575, 0.10575  0.09577, 0.10577  0.09559, 0.10559  0.09147, 0.10147 
min  0.1  0.09529, 0.10529  0.0956, 0.1056  0.09666, 0.10666  0.09679, 0.10679  0.09678, 0.10678  0.0968, 0.1068  0.09639, 0.10639  0.09164, 0.10164  
Bad  max  0.95001  0.94416, 0.95416  0.94495, 0.95493  0.94422, 0.95422  0.94397, 0.95397  0.94396, 0.95396  0.94392, 0.95392  0.94735, 0.95735  0.94459, 0.95459 
max2  0.88747  0.8825, 0.8925  0.88028, 0.89028  0.88031, 0.88031  0.88019, 0.89019  0.8803, 0.8902  0.88019, 0.89019  0.88325, 0.89325  0.88136, 0.89136  
min  0, 0.00525  0, 0.005  0, 0.00483  0, 0.00955  0.00005, 0.00959  0.00005, 0.00959  0.00131, 0.00959  [0,0.005]  
Deceleration  max  [0.08404, 0.08881]  0.08471, 0.09471  0.08802, 0.09802  0.08817, 0.09817  0.08685, 0.09685  0.08614, 0.09614  0.0863, 0.0963  0.08963, 0.09932  0.08852, 0.09852 
min  [0.04085, 0.04275]  0.03835, 0.04835  0.03861, 0.04861  0.03854, 0.04854  0.03884, 0.04884  0.03886, 0.04886  0.0389, 0.0489  0.03873, 0.04873  0.03337, 0.04337  
Collision (Basic)  max  [0.96567, 0.97254]  0.96371, 0.97381  0.96873, 0.97873  0.9684, 0.9784  0.96851, 0.97851  0.96875, 0.97875  0.96853, 0.97853  0.96851, 0.97851  0.96301, 0.97301 
min  [0, 0.00201]  0, 0.00525  0, 0.005  0, 0.00483  0, 0.00955  0.00005, 0.00959  0.00005, 0.00959  0.00131, 0.00959  [0,0.005]  
Collision (Extended)  max  [0.35751, 0.49961]  0.42267, 0.43675  0.42418, 0.4342  0.42187, 0.43187  0.42345, 0.43345  0.42463, 0.43463  0.42457, 0.43457  0.42385, 0.43385  0.42342, 0.43342 
min  [0.04296, 0.06311]  0.0482, 0.0582  0.04772, 0.05772  0.04785, 0.05785  0.04823, 0.05823  0.04812, 0.05812  0.0481, 0.0581  0.04757, 0.05772  0.04618, 0.05618  
Collision (Advanced)  max  [0.14807, 0.31121]  0.2072, 0.2172  0.20873, 0.21872  0.21872, 0.2185  0.20854, 0.21854  0.20854, 0.21854  0.20855, 0.21855  0.20111, 0.21111  0.20167, 0.21166 
min  [0.02471, 0.05191]  0.02631, 0.03631  0.03045, 0.04045  0.03016, 0.04016  0.03001, 0.04  0.03001, 0.04  0.03016, 0.04016  0.03164, 0.04164  0.0304, 0.0404  
Anesthesia  n/a  [0.00916, 0.04222]  0.01361, 0.02361  0.01339, 0.02332  0.01374, 0.02374  0.01373, 0.02373  0.01318, 0.02318  0.01311, 0.02311  0.01592, 0.02592  0.01815, 0.02815 
Confidence level c=0.99999 

Model  Type  P  Qint  
Good 
max  0.1  0.09499, 0.10499  0.09378, 0.10378  0.09386, 0.10386  0.09389, 0.10389  0.09391, 0.10391  0.09392, 0.10392  0.09405, 0.10405  0.09512, 0.10512 
min  0.1  0.09419, 0.10419  0.09667, 0.10667  0.09668, 0.10668  0.09677, 0.10677  0.09671, 0.10671  0.09679, 0.10679  0.09675, 0.10675  0.09525, 0.10525  
Bad  max  0.95001  0.94525, 0.95525  0.94579, 0.95579  0.94564, 0.95564  0.94548, 0.95548  0.94545, 0.95545  0.94543, 0.95543  0.94735, 0.95735  0.94543, 0.95543 
max2  0.88747  0.88215, 0.89215  0.88055, 0.89055  0.88057, 0.89057  0.88046, 0.89046  0.88046, 0.89046  0.88046, 0.89046  0.88325, 0.89325  0.88052, 0.89052  
min  0, 0.00517  0, 0.00319  0, 0.00494  0, 0.00984  0, 0.00992  0, 0.00992  0.00445, 0.0139  [0,0.005]  
Deceleration  max  [0.08404, 0.08881]  0.08613, 0.09613  0.08624, 0.09624  0.08312, 0.09312  0.08725, 0.09725  0.08725, 0.09725  0.08726, 0.09726  0.08746, 0.09746  0.08737, 0.09735 
min  [0.04085, 0.04275]  0.03514, 0.04514  0.03919, 0.04919  0.03918, 0.04918  0.03942, 0.04942  0.03943, 0.04943  0.03944, 0.04944  0.039, 0.049  0.03377, 0.04377  
Collision (Basic)  max  [0.96567, 0.97254]  0.96359, 0.97359  0.96241, 0.97241  0.96767, 0.9767  0.96892, 0.96892  0.96689, 0.97589  0.96683, 0.97583  0.96863, 0.97863  0.96462, 0.97462 
min  [0 , 0.00201]  0, 0.00517  0, 0.00319  0, 0.00494  0, 0.00984  0, 0.00992  0, 0.00992  0.00445, 0.0139  [0,0.005]  
Collision (Extended)  max  [0.35751, 0.49961]  0.42651, 0.43652  0.42719, 0.43724  0.42757, 0.43757  0.42656, 0.43656  0.41774, 0.42774  0.41779, 0.42779  0.42745, 0.43745  0.42875, 0.43875 
min  [0.04296, 0.06311]  0.04979, 0.05979  0.04766, 0.05766  0.04764, 0.05764  0.04748, 0.05748  0.04745, 0.05745  0.04776, 0.05776  0.05776, 0.05673  0.04576, 0.05576  
Collision (Advanced)  max  [0.14807, 0.31121]  0.20515, 0.21519  0.20558, 0.21563  0.20533, 0.21533  0.20531, 0.21531  0.20547, 0.21547  0.20547 0.21547  0.20385, 0.21385  0.20453, 0.21453 
min  [0.02471, 0.05191]  0.03011, 0.04015  0.02902, 0.03902  0.02954, 0.03945  0.03956, 0.04956  0.03861, 0.04861  0.03887, 0.04887  0.0363, 0.04363  0.03031, 0.04031  
Anesthesia  n/a  [0.00916, 0.04222]  0.01284, 0.02284  0.01513, 0.02511  0.01623, 0.02623  0.01545, 0.02545  0.01557, 0.02557  0.01562, 0.02562  0.01385, 0.02385  0.01852, 0.02852 

.2in.1in
Model  Type  c  Qint  
Good  max  0.99  24252  24025  24038  24027  24035  24034  26681  23136 
min  0,99  23451  23248  24256  24250  24253  24252  26894  23245  
Bad  max  0.99  13118  12670  12841  12817  12833  12832  23006  11726 
max2  0.99  27498  26954  26960  26955  26958  26958  40442  25734  
min  0.99  2590  96  961  688  680  680  347  94  
Deceleration  max  0.99  22842  22393  22673  22517  22628  22623  24365  20318 
min  0.99  11224  11073  11114  11086  11104  11104  11570  9798  
Collision (Basic)  max  0.99  9581  9318  9653  9463  9386  9381  10643  8222 
min  0.99  2590  96  961  688  680  680  347  94  
Collision (Extended)  max  0.99  65109  64804  64854  64841  64932  64930  104637  62485 
min  0.99  13624  13257  13486  13375  13326  13320  14737  12869  
Collision (Advanced)  max  0.99  44370  43602  43645  43640  43644  43643  51734  43524 
min  0.99  9500  9081  9094  9085  9090  9089  9282  9080  
Anesthesia  n/a  0.99  5801  4847  5024  4952  4928  4919  5522  4804 
Good 
max  0.99999  70422  69484  69582  69496  69530  69529  77262  68456 
min  0,99999  71898  71286  71339  71293  71321  71321  79369  68994  
Bad  max  0.99999  37388  36518  36771  36629  36687  36868  60006  36164 
max2  0.99999  79306  79097  79125  79101  79118  79118  96442  77892  
min  0.99999  5797  124  2766  1963  4136  4136  572  94  
Deceleration  max  0.99999  65248  65233  65330  65299  65320  65319  72114  59882 
min  0.99999  33147  32969  33133  33018  33060  33060  34231  29096  
Collision (Basic)  max  0.99999  25279  24711  24834  24789  24934  24933  26045  23016 
min  0.99999  5797  124  2766  1963  4136  4136  572  94  
Collision (Extended)  max  0.99999  191466  190776  191253  190894  191485  191472  376294  185456 
min  0.99999  41153  38942  39745  39473  39537  39541  47923  37608  
Collision (Advanced)  max  0.99999  131517  129746  131185  129845  129934  129933  183405  127486 
min  0.99999  27305  25657  25835  25736  25792  25791  29362  24569  
Anesthesia  n/a  0.99999  16197  15453  15834  15634  15734  15733  17845  15314 

5.3 Tested Models Results
Intervals based on CLT and Bayesian interval.
Based on our model set, we provide in Table 1 a comparison of the CIs described in the Section 4, obtained via ProbReach with precision , interval size and true probability value Pis either analytically computed single probability values or formally computed absolute (nonstatistical) intervals. These parameters were chosen according to previous work [19] as a fine tradeoff between the precision of the results and the CPU time. Each model was verified separately with different confidence level from 0.99 to 0.99999 (see also Table 3). The lowest confidence level is often used in similar work, while the highest can provide reasonable results for realworld complex models.
As it can be seen in Table 1, all the intervals for the various techniques overlap. The modified approach shows very similar results to the , which can be regarded as a successful implementation. The key difference in the interval sizes can be found in the results of the Bad model Type min and the Collision (Basic) model Type min. From the results we can conclude that the true probability value is very close to 0. This allows the Bayesian, CLT and AgrestiCoull methods to form intervals, which in reality are half of the proposed interval size , while the other techniques return fully sized intervals. That happens because is using posterior distribution to form the interval. At the same time, the and calculations of the mean value return the result, which is quite close to zero. Thus, the next step of the interval bounds computation cuts the negative part of the interval. This trend is holding for all probability values within .
Table 1 also shows that with the increase of the confidence level the interval’s precision is growing, which in turn is directly related to the usage of the inverse cumulative distribution function for normal random variable with given confidence level in formulas for (7), (8), (10) and (12). It also results in the increase of the sample size for and .
The comparison of the obtained intervals (see Table 1) with the true probability value or interval P shows that all CIs contain the single probability values, but (see Bad type min model of Table 1), and all CIs overlap with the true probability intervals. We can also note that the true probability intervals of the Collision Extended, Collision Advanced, and Anesthesia models contain all confidence intervals for all confidence levels (see also Table 3). The reasons why Collision Basic and Deceleration models’ true probability intervals do not contain CIs are their size, which is very small () and their closeness to 0.
Table 2 provides very interesting results with respect to the number of samples, which were used to find CIs obtained via ProbReach with solver precision and interval size . The number of samples varies for different models and types (see also Table 4). As it was noted earlier for Figure 4, the number of samples needed for the computation grows from the bounds to the center of the [0,1] interval. The presented models show different behaviour and probability results. The most important outcome is that all CIs (except ) show better result in number of points with respect to . The best result was shown by . It shows that the proposed CLT modification can provide reasonable results for RQMC calculation in comparison with the wellestablished Bayesian MC integral calculation.
Qint method results.
A comparison of the Qint method’s confidence intervals is presented in Table 1. All of the Qint intervals also contain single probability values and overlap with true probability intervals. The original Qint algorithm is not able to provide results for Bad type min and Collision Basic type min models, because for very small probability values like and [0, 0.00201] it could not detect for the chosen confidence levels and interval size. Due to this reason the original Qint algorithm was changed by modifying the CLT method described in Section 4.1. From the results we see that that the Qint algorithm shows great potential, which is connected with the very fast convergence rate of the QMC method and with finding an appropriate partition (in terms of the parameters ).
Table 2 allows us to compare Qint’s sample sizes with those of other CIs. However, had an advantage in the number of samples for small probability values near the border (see Figure
Comments
There are no comments yet.