Softwares are fuel of the current world. Economy, technology, transport, communication, medical treatment - all of these essential components of our daily lives are critically dependent on successful execution of softwares. Most of the modern day devices may not function properly if the concerned softwares carry bugs. Thus, it is not surprising that estimation of software reliability remains a cornerstone in the field of software development and testing (Pham, 2000; Yamada, 2014).
Determination of optimum time for software release remains an interesting field of research (Chakraborty et al., 2019). It has also been proposed that the software testing data needs to be collected in a relatively efficient manner than the traditional (Nayak, 1988). Time between failures data have become difficult to collect as the complicacies in software and its testing increases. In most cases, the logged information during software testing is test-case specific and consequently discrete in nature. Estimation of optimum duration of software testing under a discrete set up has also received considerable attention (Chakraborty and Arthanari, 1994; Chakraborty, 1996; Dewanji et al., 2011; Das et al., 2019). Most of these literature try to develop optimum testing strategy based on the number of remaining bugs in the software (Chakraborty et al., 2019; Eom et al., 2013). However, if the remaining bugs are present in such paths/locations (of the software) which will rarely be traversed by any inputs to be used by the users, the chances of software failure will also be rare, which in turn, could not solely infer the software to be unreliable. Though, this particular phenomenon seems plausible and close to reality, it has not been systematically studied yet in the literature.
We introduce here a latent variable, commonly known as ‘eventual size of a bug’, in order to account for the probability of a remaining bug to be the cause of failure of a software(Chakraborty, 1996). The eventual size of a bug is defined as the number of inputs that may eventually pass through the bug during the entire lifetime of a software, irrespective of whether the bug is detected or not during the testing phase. Occasionally, the eventual size of a bug is also referred to as simply ‘the size of the bug’. A software can be considered as a collection of several paths and each input to the software is expected to follow a particular path. In particular, if the same input is used several times, it can only check whether that particular path has any bugs or not, but it will not be able to check the presence of bugs in other paths as the given input will not traverse those other paths. A software would be require different inputs to check existence of bugs in different paths. We may assume that an input can only identify at most one bug which lies on the path that the input would traverse. This size-biased approach was first introduced by Chakraborty (1996) in software reliability, although the concept had also been applied in a few other fields of investigation (Patil and Rao, 1978).
1.1 Size-biased approach in software testing
It is quite natural that a path (in the software) branches into several sub-paths at a later stage. For all these sub-paths, a part of the path is common in the beginning. Imagine that a bug is sitting on the common path and another bug is sitting on one of the several sub-paths associated with the common path. It is quite obvious that the size of the bug, present in the common path is much higher compared to that of the bugs in the sub-paths, since all inputs collectively going through each of the sub-paths must be traversing through the common path before entering into a sub-path. The size of a bug also, thus, may give an indication of how quickly a bug could be identified. If a bigger bug is not detected it would create a potential threat to the functioning of the software, even if there is only one bug. It is simple to understand that the probability of detection of a bug (and hence its fixation) depends on the size of the bug.
Larger the size of the bug, larger will be the chance of detecting that bug earlier in the testing phase as has been indicated in Chakraborty and Arthanari (1994). In fact, they have also shown that similar concepts are applicable in discovering fields with rich hydrocarbon contents in the field of producing oil and natural gas. It is also clear that a bug which exists in a path that will hardly be traversed by any input, will remain harmless as far as the running of the software is concerned. This brings us to the conclusion that reliability of the software does not depend on just the number of bugs remaining in the software, rather it depends on the positioning of the bugs, particularly the paths on which it exist and whether that path is frequently traversed by inputs which are random in nature as per the user (Littlewood, 1979). Hence in order to have a better model for software reliability, our attention would be to find out the total size of the bugs that will remain and not just the number of remaining bugs.
In a discrete software testing framework, when an input is being tested, it results in either a failure or a success (finding an error). Testing of software is carried out into many phases, where, in each phase a series of inputs are tested and results of each testing are recorded as either a success or a failure. After identifying the bugs at the end of testing within a phase, they are debugged at the end of the phase. This process of debugging is known as periodic debugging or interval debugging (Das et al., 2016).
In testing software, certain factors are critical, for example, when should we stop testing, what will be the criteria to stop software testing. If after the testing and debugging phases, certain bugs remain in the software, it may cause improper functioning of the software even after the market release. Therefore, a decision to optimize software testing and debugging time is an important part of the development process of software. Even if the number of remaining bugs is smaller, but the total size of the remaining bugs is big, then also the software may fail frequently.
The input space, consisting of all possible inputs to the software can be broadly divided into two subsets of which one consists of all the inputs which will result in a failure (we call it as a success set) and the other subset namely, failure set, consisting of all the inputs which give expected output. Testing and debugging of bugs are carried out in several phases in most situations (Dewanji et al., 2011). Unlike the assumptions in most software reliability models, in real life situations, it is quite difficult to debug every time a bug is found. It may happen that two inputs have a common path at the beginning1 due to the presence of some common factors and then each of the inputs branches off to complete the job.
Following Dewanji et al. (2011), the process of testing flags off as soon as a bug is found and the process is stopped culminating in recording or logging in an incidence of a success. It is easy to understand that the next bug in the path can be detected only after debugging the bug which is detected earlier. Therefore we can assume that the size of a bug which is present at the beginning of a path is much larger compared to the size of a bug present at the end of a sub-path or compared to the bugs present in a path that are hardly traversed by any input.
Hence, detecting a bug during software testing, can be thought to be a probabilistic sampling, where the chances of a bug being detected is an increasing function of the size of the bug. This is very similar to the size-biased modeling by Patil and Rao (1978), for modeling identification of species.
The article is organized as follows. In Section 2, we developed the Bayesian size-biased model, whereas in Section 3, we mention some model fitting criteria and model performance measured used in the study. In Section 4
, we showcased a simulation study to assess the performance of our model. We assessed the robustness of the models using relative bias, coefficient of variation, and coverage probability of the 95% credible intervals of population size of the bugs. Application of this model is carried out on two different empirical data sets. Section5 illustrates the application to a commercial software data set and in Section 6, we show the application of the model to a very critical data set used for space mission software testing. The article ends with a discussion and conclusion in Section 7.
2.1 General approach
We utilized the hierarchical modelling philosophy to formulate a model to address the problem of detection of the bugs in software testing procedure. The developed model can be used to estimate the total number of bugs present in the software, as well as the remaining eventual bug size. We also provided a new procedure to predict the stopping phase such that the estimated remaining bug size at that phase remains below a preassigned threshold. Later, we extended the model described above to also accommodate the possible groups of bugs who share the same bug size.
2.2 Model description
The model has composed of two hierarchical structure: one for the state process that explains the latent dynamic of the bugs within the software, while the other part corresponds to the observation model explaining the probabilistic structure of the observed software testing data.
2.2.1 State process
Consider number of distinct and independent bugs are present in a particular software and size of each bug is denoted by , . The eventual size of a bug (or in short, size of a bug) is considered as a latent variable in the model and is needed to be estimated. Let S
denotes a vector of these latent variablesdefining the size of the (unknown) bugs under study. For the ease of computation and other technical advantages (described later), we define , where represents the maximum possible number of bugs present in the software and denotes the inclusion probability to indicate the proportion of that represent the real population of bugs.
2.2.2 Observation process
We suppose that inputs are used for each of the testing phases. We consider the situation where a present bug can get detected in any of the inputs at the -th phase, .
Let represent the binomial detection outcome for a bug over the inputs at phase . If , this subsequently implies that , . It should be noted that after a bug gets detected at the -th phase, it is eliminated from the pool of bugs during the debugging at the end of phase . For example, in a software testing, if bug 1 gets detected at phase , we would have and .
We used the data augmentation approach to model the number of bugs in the software by choosing a large integer to bound and introduced a vector of
latent binary variablessuch that if individual is a member of the population and otherwise. We assume that each is a realisation of a Bernoulli trial with parameter , the inclusion probability.
A binomial model, conditional on , is assumed for each observation :
where denotes the detection probability of the -th bug in a phase. The detection probability is modelled as a increasing function of the bug size , since the detection probability directly depends on the size of a bug, that is, more the bug size, higher the detectability.
2.2.3 Model for detection probability
From the definition of bug size, is higher if placement of -th bug is on a common path near the origin and a number of sub-paths follow subsequently. If denotes the probability of bug detection in any one of the inputs that will pass through the -th bug, then the probability of detecting -th bug with one input is
The parameter plays the role of a shared parameter across all the bugs and critical for the dependence structure of the nodes in our joint probability model. In addition, the above formulation of comes naturally from our definition of bug size and accounts for individual-level heterogeneity in detection probability of the bugs (Patil and Rao, 1978, also see). Note that, here is modelled as a monotonically increasing function of and when , we have .
2.2.4 Model for N
We used the data augmentation approach to model the number of bugs in the software by choosing a large integer to bound and introduced a vector of latent binary variables such that if individual is a member of the population and otherwise. We assume that each is a realisation of a Bernoulli trial with parameter , the inclusion probability.
We assume that bugs get detected over the testing phases which is expected to be less than the total number of bugs due to imperfect detection during testing. Consequently, as part of the data augmentation approach, the detection data set is supplemented with a large number of “all-zero” encounter histories , an array of “all-zero” detection histories with dimensions . We label the zero augmented complete detection data set as Y.
2.2.5 Estimating the remaining eventual bug size and the stopping phase
The above model is well suited to estimate the number of bugs , the detection probability ’s, and bug size ’s. But to estimate the remaining eventual bug size at a later untested phase, we proceed as follows.
We denote as the model for the detection observations for a bug with number of inputs , , where , and as future observation or alternative detection outcome that could have been obtained during the testing phase. Since the stopping phase (such that the remaining eventual total size of the bugs is less than a threshold, say, ) is unknown to the software tester, we assign a sufficiently large value for
, considering the available RAM size of the computing device and and computing time. The posterior predictive model for a new detection datafor the -th bug is then,
where denotes the vector of all the parameters and is the predictive density for induced by the posterior distribution .
In practice, we obtain a single posterior replicate by drawing from the model , where represents a set of MCMC draws from the posterior distribution of parameter .
We define a set of deterministic binary variables which takes the value 1 if -th bug is detected on or before -th phase and 0 otherwise. Total size of the bugs that are detected up to the -th phase is then computed as , . Consequently, we also compute the total eventual remaining size of the bugs that are not detected up to the -th phase, , . We obtain the stopping phase, denoted by , such that (where is a preassigned threshold). We compute for each replicated data set , , thus enabling us to obtain an MCMC sample for both and .
2.2.6 Estimation of software reliability
Software reliability at phase
is defined as the posterior probability that the remaining size is less than or equal to the prefixed small quantitygiven the observed data Y,
2.3 Modelling for grouped bugs
Often we come across situations where a few bugs are collocated on the same path or same part of the software in such a way that we can assume without loss of generality that each of them have the same bug size. For computational and notational simplicity, we make a transformation of the data set to where the observed data represents the number of bugs from the -th group that are detected. Consequently, we have , denotes the probability of detecting a bug belonging to the -th bug-group with a single test case and denotes the corresponding phase to the -th group.
Here, we consider a number of distinct group of bugs that are present in a software and each bug in a group (say, -th) has size . Each group of bugs comprises at least one bug. Following Section 2.2.1, we define , where is a large positive integer that gives an upper bound to . The link between and the size remains the same as in Section 2.2.3, . We used the data augmentation approach to model the number of bug-groups (discussed in Section 2.2.4). The total number of bugs has the following expression:
where denotes the number of bugs detected during the testing period and denotes the number of bugs in the -th group that went undetected. We utilized the posterior predictive distribution of new detection data with density to estimate .
To compute the remaining eventual size, we introduce binary variables , , where takes the value 1 if -th bug-group is detected on or before -th phase and takes 0 otherwise. The remaining eventual size is calculated as , where denotes the number of bugs in -th bug-group .
2.4 Prior assignment
Bug sizes (’s) are usually latent and unobservable. We assign a Poisson-Gamma mixture prior for to capture the required level of variability in the latent variable. Consequently, each
is assumed to follow Poisson distribution with mean, where the
is a random draw from Gamma distribution with shape parameterand rate . We assign bounded Uniform prior over the interval for detection probability and the inclusion probability
3 Model fitting criteria
We fitted models using Markov chain Monte Carlo (MCMC) simulations with NIMBLE(de Valpine et al., 2017) in R (R Core Team, 2019). We ran three chains of 10000 iterations including an initial burn-in phase of 5000 iterations. MCMC convergence and mixing of each model parameters was monitored using the Gelman-Rubin convergence diagnostics (Gelman et al., 2014, with upper threshold 1.1) and MCMC traceplots.
3.1 Model performance measures
We used relative bias, coefficient of variation and coverage probability to evaluate the effect of detection function misspecifications on population size and home range size estimators. Suppose denotes a set of MCMC draws from the posterior distribution of a scalar parameter .
Relative bias. Relative bias (RB) is calculated as
where denotes the posterior mean and gives the true value.
Coefficient of variation. Precision was measured by the coefficient of variation (CV):
is the posterior standard deviation of parameter.
Coverage probability. Coverage probability was computed as the proportion of model fits for which the estimated 95% credible interval of the estimate (CI) contained the true value of .
4 Simulation study
4.1 Description of simulated data and simulation scenarios
For a complex high-dimensional model such as described in Section 2.2, it would be instrumental to assess model performance with respect to different ranges of the model parameters. We simulated detection data sets of software testing for two values of detection parameter , viz., and , and two values of number of inputs in each phase (), viz., 1000 and 2000. In total we have four different simulation scenarios (viz., Sets 1-4) and we simulated a total of 200 data sets (i.e., 50 data sets under each scenario). In each scenario, we assumed a fixed number of bugs for simulating the detection data of bugs and the software testing was carried out over phases. The key details of the simulated data sets are given in Table 1. The number of detected bugs (and also the total number of detections) are higher on average (mean 132) in the set 2 with number of inputs as 2000 as compared to set 1 (mean 106) with number of inputs as 1000, detection parameter remains unchanged in both these two sets at . Same phenomenon can be observed for sets 3 (number of inputs = 1000) and 4 (number of inputs = 2000) where (see Figure 1a,c). For estimating the remaining eventual bug size and the stopping phase, the posterior predictive simulations are carried out for 25 additional phases, implying (see Section 2.2.5).
4.2 Results from Simulation study
We fitted our Bayesian size-biased model to each of the 200 simulated data sets using MCMC and is set to 400 for each model fitting. All MCMC samples of the parameters of interest (e.g., population size , detection parameter ) were obtained after ensuring proper mixing and convergence, with values below 1.1. The posterior estimates of different parameters were obtained using the MCMC chains. The posterior summaries of the total number of bugs and detection parameter for the simulation study are provided in Tables 2 and 3, respectively and also portrayed in Figure 1.
The relative bias and coefficient of variation of and are estimated for each of the 50 replicates in each set. The relative bias estimates of in each set varied between: (-16%, 19%) in set 1, (-9%, 19%) in set 2, (-12%, 15%) in set 3, (-9%, 9%) in set 4) and the coefficient of variation of in each set varied between: (8%, 12%) in set 1, (5%, 7%) in set 2, (6%, 7%) in set 3, (4%, 5%) in set 4. The relative bias estimates of in each set varied between: (-36%, 37%) in set 1, (-32%, 32%) in set 2, (-33%, 27%) in set 3), (-18%, 22%) in set 4 and the coefficient of variation of in each set varied between: (16%, 24%) in set 1, (13%, 17%) in set 2, (13%, 17%) in set 3, (11%, 15%) in set 4. Coverage probabilities of both and were higher than 90% in each of the scenarios (Figure 1).
We estimated the reliability at the end of each phase and also at different possible future phases (assuming a pre-specified number of test cases in each phases). It is important to mention that, the estimation of reliability heavily depends on the pre-specified threshold and the number of test cases used during the future phases (that would be conducted after the first 5 phases already conducted). Here we have assumed that the number of test cases in each future phase to be the same as the number of inputs in the respective scenario.
The reliability (i.e., posterior probability of the remaining size lying below a threshold) is a non-decreasing function of testing phase index, since remaining bug size gets reduced with more bugs being detected in subsequent testing phases. We found the reliability estimates to attain the targeted 95% level (with threshold 100) to be varying with respect to different simulation scenarios (Figure 1). For instance, the reliability estimate attained the optimum 95% level (with threshold 100) at phase 30 in set 1, implying the developer would need to continue software testing for 25 more future phases (after the 5 testing phases already conducted) to attain optimum software reliability level. Hence, the stopping phase was estimated as 30. For other sets, the estimates of the stopping phases were at phase 24 (set 2), phase 14 (set 3) and phase 10 (set 4).
5 Application to Software testing empirical data
5.1 Data description
The data set consists a total of 8757 test inputs detailed with build number, case id, severity, cycle, result of test, defect id etc. In this data, the severity of a path is broadly divided into three categories, namely, simple, medium and complex depending on the effect of the bug if it is not debugged before marketing the software. The data has four cycles namely Cycle 1, Cycle 2, Cycle 3 and Cycle 4, which is equivalent to the different phases of testing we have referred to Section 2. After each cycle, the bugs that are identified during the cycle are debugged as mentioned in the Section 2.
5.2 Results from Software bug data analysis
The posterior estimates of the main parameters , , and are provided in Table 4 and visually portrayed in Figure 2. The posterior mean estimate of the total number of bugs was 348 with a 95% credible interval (317, 382). The posterior mean of inclusion probability was estimated at 0.696 with a 95% credible interval (0.618, 0.774). The estimate of also confirmed that the upper bound we had set was sufficiently large enough to not to influence in the estimation of . Although the posterior mean estimate of size-biased detection model parameter was estimated at a very small magnitude , we had coded the parameter with a logistic transformation to retain the accuracy in estimation and MCMC mixing. The remaining eventual bug size after the 4 testing phases was estimated as 703 with a 95% credible interval (457, 1006). Here we have assumed that the number of test cases in each future phase to be 3000 in order to resemble with the observed data set.
We found the reliability to attain the target 95% level at phase 16 if we would have continued with 3000 test cases in each phase, implying the developer would need to continue software testing for 12 more future phases (after the 4 testing phases already conducted) to attain the targeted software reliability level. Hence, the stopping phase was estimated as 16. The reliability took much longer (40 phases) to reach the targeted 95% level with 1000 test cases in each phase, and took only 12 phases with 5000 test cases in each phase (these results are provided in the appendix). This also revealed that it takes approximately 36000 future test cases to attain the targeted reliability of 95%.
6 Application to ISRO mission empirical data
6.1 Data description
The ISRO data set consists of the outcomes from software testing conducted on each of the 5 softwares during 35 missions. Each of the softwares had been updated before different missions were executed. There were 3 primary stages of software testing: (i) ‘Code inspection’ (CI) where a group of experts manually tests each of these softwares in search of potential bug(s), (ii) ‘Module testing’ (MT) where different parts or modules of these softwares are tested, (iii) ‘Simulation testing’ (ST) where numerous inputs are run through the software in seven different phases, viz., SIP, SFIT, IPT, Stress OILS, HLS, ALS and Performance OILS. Different number of bugs were detected during these three primary stages: bugs were detected during CI stage, bugs were detected during MT and bugs were detected during ST (where the phase specific segregation is as the following: = 9, = 7, = 7, = 8, = 1, = 2, = 0). There were also different number of test cases for each mission in each software and in each phase. For our analysis we consider the testing data from MT and seven phases of ST (i.e., testing phases) in total as observed data set. We use the detections during CI as deterministic constant because of the lack of probabilistic structure of this testing phase.
6.2 Results from ISRO mission data analysis
We applied the grouped version of our size-biased model (Section 2.3) to ISRO mission data set which was perfectly suited for applying this model. The different missions, different softwares used in those missions and the different phases - all contributed to the variation of groups and number of bugs in a group. In the observed data set, any change in the mission, software or phase was considered as a different group formation. Here, it is not possible to extend the number of phases, hence instead of finding a stopping phase, we obtain the number of future test cases required to get the remaining bug size below a pre-specified threshold. This future test cases can be implemented before a future mission or after a software update.
The posterior mean of number of groups of bugs was estimated at 84 with a 95% credible interval (80, 89) (see Table 5). The posterior mean estimate of is 0.257 with a 95% credible interval (0.195, 0.323). This also confirms our specified upper bound for the number of groups to be appropriate. The size-biased detection model parameter is estimated as with a 95% credible interval (, ). The total number of bugs present was estimated as 94 with 95% credible interval (94,95) which is highly precise.
The reliability of the softwares is estimated as 0.995 after the 8 testing phases (including module testing and seven phases of simulation testing) with threshold . Since the testing phases had managed to detect almost all the bugs present in the softwares, this has led to such high reliability. We also show that reliability increases with the increase in number of future test cases (Figure 3).
We described a Bayesian size-biased model that can be applied to software testing data set to explicitly model and estimate the population size, detection parameter and latent size of the bugs. The model also allows estimation of reliability at any given phase for any given threshold for the remaining bug size using posterior predictive distribution of the bug detection data. Consequently, we could obtain an estimate of the stopping phase providing the number of additional phases of testing are required to achieve an optimum reliability level (say, 0.95).
We showed via a simulation study that the parameters of interest (e.g., , , reliability) can be accurately estimated by our model. Number of inputs plays a key role in software testing in general, as higher number of inputs boosts the probability of detecting of bugs (Table 1). This also led to more accurate estimation of the model parameters, which can be observed in the lower magnitude of CV estimates of and with higher number of inputs (Tables 2 and 3). Further, we also noticed that, in such scenarios, threshold reliability level was attained comparatively quicker than the scenarios with lower number of inputs (Figure 1e).
Size biased model fitted to empirical software testing data of bugs yielded satisfactory estimates of the key parameters. However, we noticed that the software testing conducted were rather inefficient since the estimated software reliability was approximately near zero after the first four phases of testing (Figure 2). We anticipate that some major bugs (with moderately large size) were still present. We receommend to continue testing for at least 36000-40000 more test cases (which could be broken down into multiple phases) to attain the desired software reliability level 95%.
On the contrary, software reliability estimates of ISRO mission softwares were found to be extremely high (i.e., 0.998) after the first 8 testing phases, demonstrating the advantage of efficient software testing. Our finding that the number of bugs detected were almost equal to the true number of bugs available to be detected also supports this.
The developed model can also be used for similar problems in the other fields. For instance, in hydrocarbon exploration, digging a field can be considered analogous with testing a software with different inputs, outcome of which can be considered either as a success (implying sufficient hydrocarbon has been found after digging) or as a failure (implying that the digging did not yield sufficient hydrocarbon which may be viable).
Given the enormous amount of interest in software testing in technology sector, our size-biased model could be very useful to provide accurate estimates of the number of present bugs as well as software reliability. Our model used the Bayesian paradigm which added the required flexibility to estimate a large number of model parameters. Although we found the parameter estimates to be moderately robust, we recommend to conduct a prior sensitivity study before application of the size-biased model.
Conflicts of interest
It is hereby declared that the authors do not have any conflict of interest.
R codes for generating simulated data and data analysis are provided in the online supplementary material and also can be found in GitHub https://github.com/soumenstat89/size_biased.
- Chakraborty (1996) Chakraborty, A. K. (1996). Software quality testing and remedies. PhD thesis, Indian Institute of Sciences.
- Chakraborty and Arthanari (1994) Chakraborty, A. K. and Arthanari, T. S. (1994). Optimum testing time for software under an exploration model. OPSEARCH-NEW DELHI, 31:202–214.
- Chakraborty et al. (2019) Chakraborty, A. K., Basak, G. K., and Das, S. (2019). Bayesian optimum stopping rule for software release. Opsearch, 56(1):242–260.
- Das et al. (2016) Das, S., Dewanji, A., and Chakraborty, A. K. (2016). Software reliability modeling with periodic debugging schedule. IEEE Transactions On Reliability, 65(3):1449–1456.
- Das et al. (2019) Das, S., Sengupta, D., and Dewanji, A. (2019). Optimum release time of a software under periodic debugging schedule. Communications in Statistics-Simulation and Computation, 48(5):1516–1534.
- de Valpine et al. (2017) de Valpine, P., Turek, D., Paciorek, C. J., Anderson-Bergman, C., Lang, D. T., and Bodik, R. (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26(2):403–413.
- Dewanji et al. (2011) Dewanji, A., Sengupta, D., and Chakraborty, A. K. (2011). A discrete time model for software reliability with application to a flight control software. Applied Stochastic Models in Business and Industry, 27(6):723–731.
- Eom et al. (2013) Eom, H. s., Park, G. y., Jang, S. c., Son, H. S., and Kang, H. G. (2013). V&v-based remaining fault estimation model for safety–critical software of a nuclear power plant. Annals of Nuclear Energy, 51:38–49.
- Gelman et al. (2014) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian data analysis. CRC press, Taylor & Francis Group, Boca Raton, FL, Third edition.
- Littlewood (1979) Littlewood, B. (1979). Software reliability model for modular program structure. IEEE Transactions on Reliability, 28(3):241–246.
- Nayak (1988) Nayak, T. K. (1988). Estimating population size by recapture sampling. Biometrika, 75(1):113–120.
- Patil and Rao (1978) Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics, pages 179–189.
- Pham (2000) Pham, H. (2000). Software reliability. Springer Science & Business Media.
- R Core Team (2019) R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Yamada (2014) Yamada, S. (2014). Software Reliability Modeling: Fundamentals and Applications, volume 5. Springer.