 # Uncertainty and Sensitivity Analyses Methods for Agent-Based Mathematical Models: An Introductory Review

Multiscale, agent-based mathematical models of biological systems are often associated with model uncertainty and sensitivity to parameter perturbations. Here, three uncertainty and sensitivity analyses methods, that are suitable to use when working with agent-based models, are discussed. These methods are namely Consistency Analysis, Robustness Analysis and Latin Hypercube Analysis. This introductory review discusses origins, conventions, implementation and result interpretation of the aforementioned methods. Information on how to implement the discussed methods in MATLAB is included.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Mathematical models of biological systems are abstractions of highly complex reality. It follows that parameters used in such models often are associated with some degree of uncertainty, where the uncertainty can be derived from various origins. Epistemic uncertainty refers to uncertainty resulting from limited knowledge about the biological system at hand, whilst aleatory uncertainty stems from naturally occurring stochasticity, intrinsic to biological systems [1, 2, 3]. Model parameters may thus be naturally stochastic, theoretically unknown, and unfeasible or impossible to measure precisely (or at all). Further magnifying the contributions of uncertainty in mathematical models of biological systems, in particular, is the fact that one parameter in the mathematical model may correspond to a multitude of underlying biological mechanisms and features in the real, biological system. This is especially true for minimal parameter models, i.e. mathematical models that aspire to be as non-complex as possible whilst still capturing all biological features of interest . There exist multiple method papers that describe how to perform uncertainty and sensitivity analyses methods, authors Alden et al. even provide a free R-based software package (Spartan ) that enables the user to perform different such methods, including the three methods discussed in this review. However, as these methods have been developed across multiple research fields, both inside and outside of the natural sciences, it is difficult to find one comprehensive review that discusses not only how to perform these methods, but also where these methods come from, and why certain conventions are proposed and/or used. To this end, we have in this paper gathered such information for three uncertainty and sensitivity analyses techniques, namely Consistency Analysis, Robustness Analysis and Latin Hypercube Analysis. Our aim is that this will allow the reader to better evaluate uncertainty and sensitivity analyses presented by other authors, and encourage the reader to consider performing these methods when suitable.

In order to understand the impact that parameter uncertainty and parameter perturbations have on results produced by a mathematical model, uncertainty and sensitivity analyses can be used. A mathematical model that comprises a set of uncertain model parameters (or inputs), is able to produce a range of possible responses (or outputs). Uncertainty analysis assesses the range of these outputs overall, and provides information regarding how certain (or uncertain) we should be with our model results, and the conclusions that we draw from them . Sensitivity analysis describes the relationship between uncertainty in inputs and uncertainty in outputs. It can be used to identify which sources of input uncertainty (i.e. which model parameters) significantly influence the uncertainty in the output and, equally importantly, which do not . Assessing how sensitive the output is to small input perturbations is a healthy way to scrutinise our mathematical model . Moreover, for a well-formulated model, knowledge regarding how input uncertainty influences output uncertainty can yield insight into the biological system that has not yet been empirically observed . Furthermore, if the uncertainty in some input parameter is shown to not affect output uncertainty, the modeller may consider fixing that parameter, and thus reducing model complexity in accordance with a minimal-parameter modelling approach. In local sensitivity analysis techniques, model parameters (inputs) are perturbed one at a time whilst other parameters remain fixed at their calibrated value. In global sensitivity analysis techniques, all model parameters are simultaneously perturbed .

There exist several sensitivity and uncertainty analyses techniques, but here we will focus on three such techniques that are suitable to use in conjunction with agent-based mathematical models. These techniques are namely Consistency Analysis, Robustness Analysis and Latin Hypercube Analysis, which all answer important, and complementary, questions about mathematical models and their corresponding in silico responses [1, 2].

Note that Consistency Analysis is only meaningful when analysing models with stochastic variables. Note that there exist other uncertainty and sensitivity analyses techniques, suitable for agent-based models, that are outside the scope of this review [2, 8].

The statistical techniques described in this work have been developed and applied across multiple academic disciplines, both inside and outside of the natural sciences. Consequently, terminology and notations vary in the literature. The aim of this work is to combine pertinent literature from various academic fields whilst keeping terminology and mathematical notations consistent, unambiguous and tailored towards a mathematical and scientific audience. Therefore, when needed, certain algorithms from the literature are here reformulated into expressions that a mathematician would consider to be conventional. This review is intended to provide gentle, yet comprehensive, instructions to the modeller wanting to perform uncertainty and sensitivity analyses on agent-based models. Thorough directions on how to perform Consistency Analysis (Section 3), Robustness Analysis (Section 4) and Latin Hypercube Sampling and Analysis (Section 5) are provided. Consistency Analysis utilises the measure of stochastic superiority, which is therefore discussed in Section 2. Throughout this work, we have included some historical information that elucidates why certain statistical conventions are used. Each section also contains pictorial, step-by-step instructions on how to perform the aforementioned techniques. As a case study example, all methods discussed in this review are implemented in one of our recent agent-based, multiscale, mathematical oncology studies .

## 2 The measure of stochastic superiority

### 2.1 The Common Language Statistics

In 1992, McGraw and Wong introduced the common language statistics () as an intuitive way to compare two distributions of data . The

was initially introduced as a tool to compare data from normal distributions, but was later on approximated for use on any continuous distributions. The

describes the probability that a random data sample from one of the distributions is greater than a random data sample from the other distribution. For example, if we have two continuous data distributions

and , and we are comparing the distributions with respect to some variable , then the is simply given by

 CLBC(X)=P(XB>XC), (1)

where standard probability notations have been used so that denotes the probability that a random data sample from distribution is greater than a random data sample from distribution . Thus the subscript of here signifies the distribution from which the data sample was taken.

### 2.2 The A measure of stochastic superiority

The was developed to compare continuous data distributions, but Vargha and Delaney  introduced the A measure of stochastic superiority (or A-measure for short) as a generalisation of the that can directly be applied to compare both continuous and discrete distributions of variables that are at least ordinally scaled. When comparing two distributions and , with respect to the variable , the A-measure is given by

 ABC(X)=P(XB>XC)+0.5P(XB=XC), (2)

where denotes the probability that a random data sample from distribution is equal to a random data sample from distribution . By comparing Equations 1 and 2, it is clear that in the continuous case, where , the A-measure reduces to the .

If two distributions that are identical with respect to the variable are compared, then and we say that the distributions and are stochastically equal with respect to the variable . On the other hand, if , then we say that the distribution is stochastically greater than distribution , and accordingly, that distribution is stochastically smaller than distribution . If distribution is stochastically greater than distribution with respect to the variable , it simply occurs more often that the sample is greater than the sample when two random samples and are compared. Likewise, if distribution is stochastically smaller than distribution with respect to the variable , it occurs more often that the sample is smaller than the sample when comparing two random samples and . These definitions of stochastic relationships (stochastically equal to, stochastically greater than, stochastically smaller than), used by Vargha and Delayney , amongst others, are weaker than definitions used by some other authors, but sufficient and appropriate for the our current purposes: comparing distributions of discrete data samples produced by in silico simulations based on stochastic, individual-based mathematical models.

When comparing two samples and , the possible outcomes are (i) that is greater than , (ii) that is equal to and (iii) that is smaller than . These three possible outcomes must sum up to one so that,

 P(XB>XC)+P(XB=XC)+P(XC>XB)=1. (3)

In the continuous case, as previously stated, and thus it follows that

 P(XC>XB)=1−P(XB>XC),for continuous distributions, (4)

and thus it suffices to know only one of the values or in order to determine the stochastic relationship between the distributions and with respect to .

• For example: if , then it is clear that and thus that , or equivalently, that distribution is stochastically smaller than distribution .

However in the discrete case, is not generally equal to zero and therefore,

 P(XC>XB)=1−P(XB>XC)−P(XB=XC)for discrete % distributions. (5)

Consequently, one single value or alone can generally not be used to determine the stochastic relationship between the distributions and .

• For example: if, again, , it follows that . This does not give us enough information to determine the stochastic relationship between the two distributions and .

In order to proceed to compare the distributions and in this case, the stochastic difference is introduced where is given by

 δ=P(XB>XC)−P(XC>XB),δ∈[−1,1]. (6)

Via a linear transformation, the

transformed stochastic difference, , can be obtained using Equation 5 so that

 (7)

from which we can see that the A-measure, (Equation 2), measures the stochastic difference between and under a linear transformation .

In order to estimate the A-measure using samples from two distributions, the point estimate of the A-measure, here denoted the

-measure (with a hat), is used. (In the Spartan package , this is referred to as the A test score). For example, if we want to compare two discrete distributions and , where comprises data samples (of some variable ) so that and comprises data samples (of some variable ) so that then

 ^ABC(X)=#(bi>cj)mn+0.5#(bi=cj)mn, (8)

where and and is the ‘counting function’ that simply denotes the number of time that a certain event occurs when comparing all possible pairs of data samples . For clarity, Figure 1 provides an example of how the -measure can be computed by simply counting events. Figure 1: Using Equation 8 to compute the point estimate of the A-measure, i.e the ^A-measure or ^AB,C, of the two distributions of data samples B and C of sizes m and n respectively.

Using more conventional mathematical notation, the -measure is given by

 ^ABC(X)=1mn∑mi=1∑nj=1H(bi−cj), (9)

where is the Heaviside step function such that

 H(x)=⎧⎪ ⎪⎨⎪ ⎪⎩1 for x>0,12 for x=0,0 for x<0. (10)

If , then the distributions and are stochastically equal with respect to the variable . The -measure can thus be used to measure ‘how equal’ two discrete distributions and are, by assessing how much the -measure deviates from equality, i.e. the value . The closer the -measure is to , the ‘more equal’ the two compared distributions are . In many applications, we are only interested in ‘how equal’ two distributions and are, and it is not important which distribution is the stochastically greater one. In such cases we are only interested in how much the -measure deviates from stochastic equality (i.e. the value ) but the direction is not important. Or in mathematical terms: the magnitude of the difference between the -measure and stochastic equality is important but the sign is not. The magnitudal -measure, here denoted with an underscore, ignores the sign of deviation from equality and is given by

 ^A––={^ABC(X) if ^ABC(X)≥0.5,1−^ABC(X) if ^ABC(X)<0.5. (11)

The statistical significance is used to describe the effect of the stochastic difference between two distributions and . If two distributions and are ‘fairly equal’ (i.e. if they yield an

-measure close to 0.5) then the statistical significance is classified as

small. The statistical significance is classified using the magnitudal -measure and, using guidelines from Vargha and Delaney , the statistical significance is classified to be small, medium or large with respect to according to the following threshold values for ,

 Statistical Significance=⎧⎪ ⎪⎨⎪ ⎪⎩smallif –^ABC(X)∈[0.5,0.56],mediumif ^A––BC(X)∈(0.56,0.64],largeif ^A––BC(X)∈(0.64,0.71]. (12)

These threshold values (that might appear somewhat arbitrary) were first introduced by psychologist and statistician Cohen [12, 13] in the 1960s when comparing normal distributions, but then in terms of another statistical measurement: the effect size (Cohen’s) d where

 d=∣∣(mean of population B) - (mean of population C)∣∣σ, (13)

and

is the standard deviation of either

or (as B and C here are assumed to have the same standard deviation) [13, 14]. Omitting details from statistics, a small d-value essentially corresponds to a big overlap between distributions and , whilst a large d-value corresponds to a small overlap between distributions and , as is illustrated in Figure 2. Cohen decided to use the threshold d-values for describing ‘small’, ‘medium’ and ‘large’ effect sizes to be 0.2, 0.5 and 0.8 respectively . If we hold on to the assumption that and are two normal distributions with the same variability, and furthermore say that they contain the same number of data samples, we can use measures of overlap to get a further ‘feel’ for the previously discussed effect sizes, as illustrated in Figure 2. Cohen’s d value can also be converted into ‘the probability that a random data sample from (normal) distribution is larger than a random data sample from (normal) distribution , but that is exactly what the -measure measures! So this is where the threshold values for the descriptors ‘small’, ‘medium’ and ‘large’ statistical differences listed in Equation 12 come from. Figure 2: The small (left), medium (centre) and large (right) threshold values for the scaled A measure of stochastic superiority (^A––BC) are based on Cohen’s d-values comparing two normal distributions B and Cwith the same variance. The higher the overlap between B and C, the smaller the d-value, and the smaller the ^A––BC-measure (^A––BC∈[0.5,1]).

Now, Cohen motivated his choice of the d-value thresholds using a blend of intuitive ‘everyday’ examples and mathematical reasoning , but he did issue a warning regarding the fact that the threshold values should be determined based on the research methodology at hand. Thus the (modeller) should not blindly use Cohen’s suggested thresholds, but instead reason what constitutes a small enough statistical significance in the study at hand. The (modeller) must also decide how fine the data samples in the data distributions should be before performing consistency analysis. In many applications, it is likely the amount of data samples required in order to achieve a small statistical significance increases with the fineness of the data. Nonetheless, scientific conventions are useful (no need for citations) and thus in the remainder of this chapter we will use the threshold values suggested by Cohen, as is done in other mathematical biology studies .

## 3 Consistency Analysis

In silico simulations based on mathematical models with built-in stochasticity will not produce the same output data every simulation run. Consistency Analysis (also called aleatory analysis) is a stochastic technique that answers the question: how many data samples do we need to produce in order to mitigate uncertainty originating from intrinsic model stochasticity? In our case, one data sample is the product of one in silico simulation, so an equivalent question is: how many in silico simulations should we run before describing our results in terms of, for example, average values, standard deviations or similar?

Let us say that one in silico simulation produces one data sample of some output variable . This data sample can for example correspond to ‘the population size at time point ’, or something similar. It is up to the modeller to identify and decide what the meaningful output variable(s) should be, and consistency analyses can be performed on multiple output variables at multiple time steps, for comprehensiveness. Before we begin, note that when performing Consistency Analysis, we always use the calibrated model parameters.

The first step involved in performing Consistency Analysis it to produce multiple distributions of data of various sizes. We say that a distribution with data samples has a distribution-size , and the goal of Consistency Analysis is to find the smallest -value (here denoted ) that yields a small stochastic significance. To do this, we create various distribution groups that all contain 20 distributions each of some distribution-size , as is shown in Step 1 in Section 3.1. Following the methodology described in previous work by Alden et al., and the Spartan package that they developed, , we create one distribution group that contains 20 distributions of size , one distribution group that contains 20 distributions of size and so on. Here, the -values 1, 5, 50, 100 and 300 are evaluated  and thus we must produce a total of in silico runs. (Note that if the desired accuracy is not achieved for the highest investigated -value, here , higher values of can be explored).

We here let a distribution denote the -th distribution of distribution size so that

 Dn,k={d1n,k,d2n,k,..,dnn,k} (14)

where is the the -th data sample in distribution and . The -measure resulting from comparing two distributions and with respect to the variable is denoted by

Now, within every distribution-group, we compare the first distribution () to all other distributions () using the -measure. This yields 19 -measures per distribution-group (as is shown in Step 2 in Section 3.1. The maximum scaled -measure with respect to , occurring in a distribution-group that contains distributions of size , is denoted . The smallest value for which is denoted . In other words: corresponds to the smallest distribution-size for which all of the 19 computed -measures yield a small stochastic significance, as is shown in Step 3 in Section 3.1. This answers the question that we set out to answer via Consistency Analysis: data samples (or in silico runs) are needed in order to mitigate uncertainty originating from intrinsic model stochasticity. The procedure on how to perform Consistency Analysis is outlined Section 3.1.

### 3.1 Quick Guide: Consistency Analysis

Here follows a quick guide for how to perform Consistency Analysis.

## 4 Robustness Analysis

Robustness Analysis answers the question: how robust are model responses to local parameter perturbations? Robustness Analysis investigates if, and how, perturbing the value of one input parameter significantly changes an output . Using the -measure, data distributions containing output data produced by perturbed input parameters, are compared to a data distribution containing output data produced by the calibrated input parameters. All perturbed data distributions are here of size , where is decided in the Consistency Analysis process, previously described in Section 3, when analysing stochastic models.

Before commencing the Robustness Analysis, we must identify the uncertain model parameters that we want to investigate the robustness of. We denote these parameters , where , and thus we have a total of parameters whose robustness we will investigate. Now, as illustrated in Step 1 in Section 4.1, we let each such parameter be investigated at different parameter values (including the calibrated value), and thus we need to generate a total of distributions of sample size where

 P=q∑i=1r(pi). (15)

Note that the number of investigated parameter values, , need not be the same for every input parameter . Investigated distributions of sample size are here denoted , where denotes which parameter is being perturbed and denotes the specific perturbation of parameter . For some perturbation , the parameter value equals the calibrated value for input parameter . For each parameter that we are investigating, the -measure is used to compare the calibrated distribution to all distributions . Note that when , the calibrated distribution is compared to itself and thus the -measure equals 0.5. These -measures provide information regarding the statistical significance, specifically if it can be described to be small, medium or large under parameter perturbations. Plotting the corresponding -measure over the parameter value for each parameter , paints an informative picture of local parameter robustness, as shown in Step 2, in Section 4.1. Another descriptive way to demonstrate the influence that parameter values have on some output variable is to use boxplots. As is illustrated in Step 3 in Section 4.1

, boxplots can be used to clearly show the median, different percentiles, and outliers of some data distribution

as a function of the parameter value . The methodology to perform Robustness Analysis is outlined in Section 4.1. Note that Robustness Analysis does not pick up on any non-linear effects, between an input parameter and an output , that occur when more than one model parameter is simultaneously perturbed . Such effects can however be identified using a global sensitivity analysis technique, such as Latin Hypercube Analysis, as described in Section 5.

## 5 Latin Hypercube Sampling and Analysis

Latin Hypercube Analysis answers the question: how robust are model responses to global parameter perturbations? Latin Hypercube Analysis is a type of global sensitivity analysis that investigates the relationship between input parameters and output responses when all input parameters are simultaneously perturbed. The parameters that we want to perturb are (as in Section 4) denoted , where . Thus the parameters together span a parameter space of dimension . It is impossible to test every possible combination of input parameter values if they are picked from continuous ranges. In fact, even if we select a finite number of parameter values to test for each parameter , or if we pick discrete parameter values, comparing every possible combination of parameter values may require us to produce an impractically large number of simulation runs. Thus performing in silico simulations for all possible combinations of input parameters will in many cases be at worst impossible, and at best impractical. In order to circumvent this issue, Latin Hypercube Sampling can be used . It is a sampling technique that ensures comprehensive testing coverage over the parameter space whilst keeping the number of tested parameter combinations low enough to be applicable in practice [15, 16]. After Latin Hypercube Sampling (Section 5.1), Latin Hypercube Analysis (Section 5.2) is used in order to assess global sensitivity.

### 5.1 Latin Hypercube Sampling

In the two-dimensional case, a Latin Square is an square grid containing (traditionally Latin, hence the name) different symbols such that each symbol occurs exactly once in every row and exactly once in every column , as illustrated in Figure 3. Analogously, in the Latin Hypercube Sampling framework, consider two parameters and , spanning a parameter space of dimension , where both and are sectioned into intervals. We then pick combinations of input parameter values (or sampling-points) , where , such that every -interval is sampled from exactly once and every -interval is sampled from exactly once. Within the parameter range an interval, the sampled parameter value is randomly selected (unless of course the interval contains only one possible value ). Note that the index denotes the coordinate combination that belongs to, not the interval from which the parameter value was taken. Thus there is no condition demanding that the values are ordered in a way such that .

The analogy between a Latin Square and Latin Hypercube Sampling from a two-dimensional parameter space is illustrated in Figure 3. The Latin Square can be extended to higher dimensions to form a Latin Cube (dimension = 3) or a Latin Hypercube (dimension > 3) and, analogously, the two-dimensional sampling space illustrated in Figure 3 can be extended to dimensions, spanned by the input parameters . Figure 3: Left: An ℓ×ℓ Latin Square in which each Latin symbol occurs ℓ times, exactly once in each row and exactly once in each column. Right (analogously): A two-dimensional parameter space spanned by the input parameters p1 and p2 that are both sectioned into ℓ intervals. Using Latin Hypercube sampling, ℓ parameter combinations (p1j,p2j) are sampled where j=1,2,..,ℓ and each p1-interval is sampled from exactly once and each p2-interval is sampled from exactly once.

For each parameter , the total investigated parameter range is , where and respectively denote the minimum and maximum values of to be investigated. Now each parameter range is sectioned into intervals, and we denote these intervals by . Note that all input parameters

must be sectioned into the same number of intervals. If the intervals are uniformly distributed, then the size of an interval,

, is

 w(pi)=max(pi)−min(pi)N (16)

and the -th interval has a parameter range such that

 urpi=[min(pi)+w⋅(r−1),min(pi)+w⋅r] (17)

where .

Note that there are more than one way to populate Latin symbols in a Latin Square, this can be realised by regarding Figure 3 and noticing that the A-symbols and the B-symbols cover the Latin Square in different ways. Analogously, and by extension, there are multiples ways to populate sampling coordinates in a Latin Hypercube Sampling framework. Some of these ways provide better coverage of the parameter space than do others , but details regarding such sampling-optimisation are outside the scope of this study. In this study, we use the built-in MATLAB function lhsdesign  to select which parameter combinations to use according to a Latin Hypercube Sampling approach, details about the implementation are available in the Appendix. Note that, in our case, all intervals for a parameter are uniformly spaced, but the choice of spacing can be adjusted to the specific application at hand .

Now let us address the choice of intervals , as this is not straightforward. Using the Latin Hypercube Sampling framework, every parameter , where , is partitioned into intervals and, consequently, combinations comprising parameter values are sampled and tested. Compared to a small -value, a large value of will provide more data to use, and draw conclusions from, in the Latin Hypercube Analysis stage, however, it will also increase the computational cost in the Latin Hypercube Sampling stage. There is no strict rule for how to choose , but suggested values for in the literature are for large values of (i.e. high-dimensional parameter spaces) or which has been described to be ‘usually satisfactory’ [19, 20]. Authors of the Spartan package use a lot larger numbers in their provided examples . In this example study, we decide to use uniform intervals. At the end of the day, the choice of is up to the modeller, who must outweigh the (computational) cost of producing a large number of data samples, with the advantage of having a vast amount of data, and thus plentiful information, in the analysis stage. Details regarding quantitative choices of are outside the scope of this review.

### 5.2 Latin Hypercube Analysis

During the Latin Hypercube Sampling process, different points in the -dimensional parameter space spanned by the input parameters are selected as sampling-points, as shown in Step 1 in Section 5.3. One such sampling-point, , can be described by its coordinates in the parameter space so that . Each sampling-point is used to generate output values , where is determined using Consistency Analysis. Subsequently, the median output value, here denoted , is computed for every . Now, our overall aim is to investigate the relationship between an input parameter and an output response . We investigate this input-output relationship in two steps, one of which is qualitative and one of which is quantitative. In the first, and qualitative, step we produce two-dimensional scatterplots in which median output data,

 \lx@converttounder~X(C1),\lx@converttounder~X(C2),...,\lx@converttounder~X(CN)=\lx@converttounder~X(p11,p21,...,pq1),\lx@converttounder~X(p12,p22,...,pq2),...,\lx@converttounder~X(p1N,p2N,...,pqN),

are plotted over parameter values

 pi1,pi2,...,piN,

for one of the input parameters . We do this for every input parameter and thus scatterplots are created. By simply visually analysing the data in the scatterplots, we are able to make qualitative observations regarding the relationship between the input and the parameter. Examples of such observations are provided in Step 2 in Section 5.3

As a second step, we use a quantitative measure, such as the Pearson Product Moment Correlation Coefficient (or the correlation coefficient for short), to quantitatively describe the correlation between input parameters and output responses, as done in

Step 3 in Section 5.3. The correlation coefficient is denoted , where , describes the linear association between the input parameter and the output response in terms of both magnitude and direction. A positive (linear) correlation between and means that if either the input value or the output value increases, so does the other one, and thus is positive. Conversely, a negative correlation means that if either or increases, the other one decreases, and thus is negative. The magnitude of describes the strength of the correlation, where a magnitude of corresponds to a strong linear association, and a small magnitude corresponds to a weak correlation. An -value of approximately zero indicates that there is no linear correlation between the two investigated variables. Note that the Pearson Product Moment Correlation Coefficient picks up linear associations only, thus there may exist other, non-linear correlations that are not captured by the correlation coefficient .Therefore it is important to, not only quantitatively compute input-output correlations, but to also qualitatively assess the relationships between inputs and outputs, via data visualisation in scatterplots as previously described.

The correlation coefficient, , describing the correlation between an input parameter , and an output response (in median form) is given by ,

 (18)

where a bar denotes the mean value.

When it comes to interpreting quantitative input-output relationships based on the correlation coefficient , there are no all-encompassing threshold values to use for descriptors such as ‘weak’, ‘moderate’, ‘strong’ [21, 22, 23]. Relationships quantified by correlation coefficient values close to the extrema 0 or 1 may be easy to describe as ‘negligible’ or ‘strong’, respectively but correlation coefficient values in the middle of the [0,1] range are more difficult to label. Various ‘rule of thumbs’ have been suggested in the literature but, at the end of the day, it is up to the researcher to appropriately judge what constitutes a ‘weak’, ‘moderate’ or ‘strong’ input-output relationship in the specific (modelling) application at hand, taking into account the research area, the number of data samples, and the range of investigated input values . However, even without rigid descriptor threshold values, we can compare the correlation coefficient values for all input-output pairs and see which input values are the most influential within the ranges of regarded input values. As a guide, suggested correlation coefficient descriptor threshold values presented in the literature are listed in Table 1. The methodology to perform Latin Hypercube Sampling and Analysis is outlined in Section 5.3.

## 6 Conclusion

This review is intended as a gentle, introductory review to three sensitivity analyses methods, namely, consistency analysis, robustness analysis and latin hypercube analysis. Information on how to implement these methods in MATLAB are available in the Appendix. Alternatively, all methods discussed in this review can be implemented using the R-based software package Spartan, developed by Alden et al. . In fact, many of the proceedings and conventions used in this review follow those suggested by Alden et al. in order to allow the reader to, as easily as possible, use Spartan, if desired. Scrutinising mathematical models using uncertainty and sensitivity analyses methods is an important part in model development and, in many applications, knowledge about a model’s robustness is crucial . In the context of quantitative pharmacology, for example, a mathematical model may be used to guide preclinical or, ultimately, clinical proceedings. In such cases, understanding how confident we can be with model results, and how sensitive a model is to parameter perturbations, is of the utmost importance. With this in mind, the methods discussed in this review are implemented and interpreted in one of our recent studies, in which an agent-based, cancer cell model is used to predict in vivo treatment responses (post in vitro calibration) to an anti-cancer drug that may inhibit DNA damage repair .

## Acknowledgements

SH was supported by the Medical Research Council [grant code MR/R017506/1] and Swansea University PhD Research Studentship. SS is supported by an STFC studentship under the DTP grant ST/N504464/1.

## Appendix – Matlab code snippets

### Computing measure of stochastic superiority

We here list two different MATLAB functions that can be used in order to compute the A measure of stochastic superiority in the original form, , and in the scaled form, . The function getA_measure_naive, listed below, uses direct implementations of Equations 9 and 11 to compute and return values for and

, given two input vectors

and . The function getA_measure, uses the built-in MATLAB function ranksum, to do the same.

### Creating boxplots

The MATLAB function boxplot can be used to create boxplots. The input data in one column is represented by one box in the boxplot. For details regarding labeling and style alternatives, please see the MATLAB documentation .

### Choosing Latin Hypercube Sampling points

A Latin Hypercube Sampling matrix can be created using the MATLAB function lhsdesign, which returns a matrix of size , where denotes the number of samples to be tested, and denotes the number of input parameters to investigate (and thus perturb).

Each row , in the created matrix (here denoted LHC_Matrix), corresponds to the th sampling point. Each element () corresponds to the parameter value of the th input parameter in sampling point , where each parameter ranges between 0 and 1. For different criteria on how to chose the specific parameter values within each sampled interval, please refer to the MATLAB documentation . Sampling points can, for example, be chosen in a way that maximises the distance between sampling points in the -dimensional sampling space.

### Qualitative and Quantitative Latin Hypercube Sampling Analysis

In order to qualitatively asses the correlation between an input parameter , and an output response , one can use the MATLAB function scatter. In the below listings, p and X are two data vectors.

Further, to quantify the linear correlation between and , the MATLAB function corrcoef can be used to compute correlation coefficients.

## References

•  M. Read, P. S. Andrews, J. Timmis, and V. Kumar, “Techniques for grounding agent-based simulations in the real domain: a case study in experimental autoimmune encephalomyelitis,” Mathematical and Computer Modelling of Dynamical Systems 18 no. 1, (2012) 67–86.
•  K. Alden, M. Read, J. Timmis, P. S. Andrews, H. Veiga-Fernandes, and M. Coles, “Spartan: a comprehensive tool for understanding uncertainty in simulations of biological systems,” PLoS Comput. Biol. 9 no. 2, (2013) e1002916.
•  S. Lin, W. Li, X. Qian, P. Ma, and M. Yang, “A Simulation Model Validation and Calibration Platform,” pp. 687–693. 12, 2018.
•  J. L. Gevertz and J. R. Wares, “Developing a Minimally Structured Mathematical Model of Cancer Treatment with Oncolytic Viruses and Dendritic Cell Injections,” Comput Math Methods Med 2018 (2018) 8760371.
•  S. Blower and H. Dowlatabadi, “Sensitivity and uncertainty analysis of complex models of disease transmission: An hiv model, as an example,” International Statistical Review 62 (08, 1994) .
•  A. Ligmann-Zielinska, D. B. Kramer, K. Spence Cheruvelil, and P. A. Soranno, “Using uncertainty and sensitivity analyses in socioecological agent-based models to improve their analytical performance and policy relevance,” PLoS ONE 9 no. 10, (2014) e109779.
•  A. Charzyńska, A. Nałęcz, M. Rybiński, and A. Gambin, “Sensitivity analysis of mathematical models of signaling pathways,” BioTechnologia. 93 (3) (2012) 291–308.
• 

A. Niida, T. Hasegawa, and S. Miyano, “Sensitivity analysis of agent-based simulation utilizing massively parallel computation and interactive data visualization,”

PLoS ONE 14 no. 3, (2019) e0210678.
•  S. Hamis, J. Yates, M. Chaplain, and G. Powathil, “Targeting cellular dna damage responses: Predicting in vivo treatment responses using an in vitro-calibrated agent-based mathematical model,” Preprint: bioRxiv, doi: 10.1101/841270 .
•  K. O. McGraw and S. P. Wong, “ A common language effect size statistic.,” Psychological Bulletin 111 no. 2, (1992) 361–365.
•  A. Vargha and H. D. Delaney, “A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong.,” Journal of Educational and Behavioral Statistics 25(2) (2000) 101–132.
•  J. Cohen, “The statistical power of abnormal-social psychological research: a review,” J Abnorm Soc Psychol 65 (Sep, 1962) 145–153.
•  O. Cohen, “Statistical Power Analysis for the Behavioral Sciences (Second Edition) ,” Lawrence Erlbaum Associates (1988) .
• 

J. Ruscio and T. Mullen, “Confidence Intervals for the Probability of Superiority Effect Size Measure and the Area Under a Receiver Operating Characteristic Curve,”

Multivariate Behavioral Research 47 no. 2, (2012) 201–223.
•  M. D. McKay, R. J. Beckman, and W. J. Conover, “Comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Technometrics 21 no. 2, (1979) 239–245.
•  M. D. McKay, “Latin hypercube sampling as a tool in uncertainty analysis of computer models,” in Proceedings of the 24th Conference on Winter Simulation, WSC ’92, pp. 557–564. ACM, New York, NY, USA, 1992.
•  R. Sheikholeslami and S. Razavi, “Progressive latin hypercube sampling: An efficient approach for robust sampling-based analysis of environmental models,” Environmental Modelling and Software 93 (07, 2017) 109–126.
•  MATLAB, version 1.8.0_202 (R2019n). The MathWorks Inc., Natick, Massachusetts, 2019.
•  G. Manache and C. Melching, “Sensitivity of Latin Hypercube Sampling to sample size and distributional assumptions,” 07, 2007.
•  R. Iman and J. Helton, “Comparison of uncertainty and sensitivity analysis techniques for computer models,” Report NUREGICR-3904, SAND 84-1461, Sandia National Laboratories, Albuquerque, New Mexico (3, 1985) .
•  M. M. Mukaka, “Statistics corner: A guide to appropriate use of correlation coefficient in medical research,” Malawi Med J 24 no. 3, (Sep, 2012) 69–71.
•  P. Schober, C. Boer, and L. A. Schwarte, “Correlation Coefficients: Appropriate Use and Interpretation,” Anesth. Analg. 126 no. 5, (05, 2018) 1763–1768.
•  T. Krehbiel, “Correlation coefficient rule of thumb,” Decision Sciences Journal of Innovative Education 2 (01, 2004) 97–100.
•  S. A. Visser, D. P. de Alwis, T. Kerbusch, J. A. Stone, and S. R. Allerheiligen, “Implementation of quantitative and systems pharmacology in large pharma,” CPT Pharmacometrics Syst Pharmacol 3 (Oct, 2014) e142.