Use and Understanding of Scientific Models in Systems Biology
One of the major challenges in biology is to understand how functions in cells emerge from molecular components. Computational and mathematical modelling is a key element in systems biology which enables the analysis of biological functions resulting from non-linear interactions of molecular components. The kinetics of each biological reaction can be systematically represented using a set of differential equations [1, 2, 3, 4, 5]. Due to the multitude of cell components and the complexity of molecular interactions, the kinetic models often involve large numbers of reaction rate parameters, that is parameters representing the rates at which reactions encapsulated by the model are occurring [6, 7, 8]. Quantitative experimental measurements can be used to formulate the kinetic equations and learn about the associated rate parameters [1, 2, 6, 9, 10]. This in turn provides insight into the functions of the actual biological system.
An important question is therefore how much information about the kinetic equations and parameters can be obtained from an experimental measurement. Since a key aspect of experimental measurements in modern biological science is the study of the functions of specific genes, the answer to the above question is also important for understanding the role of each gene within the components of a biological system.
In plant developmental biology, a major challenge is to understand how plant growth is coordinated by interacting hormones and genes. Previously, a hormonal crosstalk network model - which describes how three hormones (auxin, ethylene and cytokinin) and the associated genes coordinate to regulate Arabidopsis root development - was constructed by iteratively combining regulatory relationships derived from experimental data and mathematical modelling [2, 10, 7, 8, 11, 12]. However, for the mathematical model to best link with Arabidopsis root development, it is necessary to understand the parameter space of the model and identify all acceptable parameter combinations. Little is known about how the identifiability of acceptable parameter space of a model is related to specific experimental data. Therefore, this work explores the dependence of acceptable parameter space of a complex model of hormonal crosstalk [10, 7, 8, 11, 12] on experimental measurements by employing Bayesian history matching techniques [13, 14, 15, 16, 17]. Additionally, we utilise these techniques to analyse how learning about the functions of a gene through particular relevant experiments can inform us about acceptable model parameter space.
Efficient Analysis of Scientific Models
Complex systems biology models are frequently high dimensional and can take a substantial amount of time to evaluate , thus comprehensive analysis of the entire input space, requiring vast numbers of model evaluations, may be unfeasible . We are frequently interested, as is the case in this paper, in comparing the scientific model to observed data (usually measured with uncertainty), necessitating a possibly high dimensional parameter search. Our history matching approach aims to find the set of all possible combinations of input rate parameters which could have plausibly led to the observed data, given all sources of uncertainty involved with the model and the experimental data [13, 14, 17]. This biologically relevant aim requires comprehensive exploration of the model’s behaviour over the whole input space, and therefore efficient techniques, such as emulation [13, 14, 15, 16, 17, 18], are required. An emulator mimics the biological model, but is substantially faster to evaluate, hence facilitating the large numbers of evaluations that are needed.
We are often keen to understand the contribution of particular sets of observations towards being able to answer critical scientific questions. Sequential incorporation of datasets into a history matching procedure, as presented in this article, is very natural and can allow us to attain such understanding. Comprehensive understanding and parameter searching of the hormonal crosstalk model for Arabidopsis root development , by sequentially history matching specific groups of experimental observations, is the focus of this paper.
Bayes Linear Emulation
In this section we outline the process of constructing an emulator for a complex systems biology model. For more detail see 
. We represent the set of input rate parameters of the model as a vectorof length , and the outputs of the model as vector of length .
A Bayes linear emulator is a fast statistical approximation of the systems biology model built using a set of model runs, providing an expected value for the model output at a particular point
, along with a corresponding uncertainty estimate reflecting our beliefs about the uncertainty in the approximation[19, 20]. The main advantage of emulation is its computational efficiency relative to other techniques for analysing computer models: often an emulator is several orders of magnitude faster to evaluate than the model it is mimicking. Emulation has been successfully applied across a variety of scientific disciplines such as climate science [16, 21, 22], cosmology [23, 24, 25], epidemiology [26, 27], oil reservoir modelling [13, 15] as well as systems biology .
We represent each output in emulator form as :
where represents the subset of active variables, that is the inputs which are most influential for output , are known functions of and are the corresponding coefficients to the regression functions . captures residual variation in with assumed covariance structure:
where is the set of indices of the active inputs for output .
is a zero-mean “nugget” term with constant varianceover and for . The nugget represents the effects of the remaining inactive input variables .
Emulator design is the process of selecting the points in the input space at which the simulator will be run in order to construct an emulator . A popular design choice in the computer model literature is the Maximin Latin Hypercube design [29, 30].
Performing emulator diagnostics , for example calculating standardised prediction errors:
for a set of validation data , is essential for validating an emulator. Large errors indicate conflict between simulator and emulator.
1 Dimensional Example
In this section we demonstrate emulation techniques on a simple one-dimensional example. We will suppose that we wish to emulate the simple function in the range , where we treat as a rate parameter that we wish to learn about, and as a chemical concentration that we could measure. We assume an emulator of the form given by equation (1) with covariance structure given by equation (2). We assume a zero mean function and that , and . We also specify a prior expectation . Having specified our prior beliefs, we then use the update rules given by equations (Bayes Linear Emulation) and (Bayes Linear Emulation) to obtain the adjusted expectation and variance for . The results of this emulation process are shown in the left panel of Figure 1. The blue lines represent the emulator expectation of the simulator output for the test points. The red lines represent the emulator mean
emulator standard deviations, given as. By comparison with the right panel of Figure 1, we can see that the emulator estimates the simulator output well, with some uncertainty. We note that we would not expect such large emulator uncertainty on such a smooth function as this, but have deliberately ensured that there is a large uncertainty for illustrative purposes, and in particular to highlight the effects of additional runs on reducing emulator uncertainty in the continuation of this example in the section History Matching.
History matching concerns the problem of finding the set of inputs to a model for which the corresponding model outputs give acceptable matches to observed historical data, given our state of uncertainty about the model itself and the measurements. History matching has been successfully applied across many scientific disciplines including oil reservoir modelling [32, 13, 33, 15], cosmology [14, 34, 23, 25, 35], epidemiology [27, 36, 37, 38], climate modelling , environmental science [39, 40] and systems biology . Here we provide a brief summary of the history matching procedure - see  and  for more details. We need a general structure to describe the link between a complex model and the corresponding physical system. We use the direct simulator approach, otherwise known as the best input approach , where we assume that there exists a value such that best represents the real biological system, which we denote as [41, 42]. We then formally link the th output of the model to the th real system value via
and link the experimental observation corresponding to output to the real system via
where we assume . Here, is the model run at best input ,
is a random variable which reflects our uncertainty due to discrepancy between the model run at the best possible input combination setting and the real world[18, 43, 39, 44, 34], and is a random variable which incorporates our beliefs about the error between each desired real world quantity and its corresponding observed measurement. We assume , and . The connection between system, observation and model given by (6) and (7) is simple but well-used [13, 41, 14, 27, 17], and judged sufficient for our purposes. For discussion of more advanced approaches see .
We then aim to find the set of all input combinations that are consistent with equations (6) and (7), that is those that will provide acceptable matches between model output and data. To analyse whether a point it is practical to use implausibility measures for each output , as given in [32, 13, 14],
If is large this suggests that we would be unlikely to obtain an acceptable match between model output and observed data were we to run the model at . This is after taking into account all the uncertainties associated with the model and the measurements. We develop a combined implausibility measure over multiple outputs such as , and . We class as implausible if the values of these measures lie above suitable cutoff thresholds [13, 14, 17].
History matching using emulators proceeds as a series of iterations, called waves, discarding regions of the input parameter space at each wave. At the th wave emulators are constructed for a selection of well-behaved outputs over the non-implausible space remaining after wave . These emulators are used to assess implausibility over this space where points with sufficiently large values are discarded to leave a smaller set remaining [14, 17].
The history matching algorithm is as follows:
Generate a design for a set of runs over the non-implausible space , for example using a maximin Latin hypercube with rejection .
Check to see if there are new, informative outputs that can now be emulated accurately and add them to the previous set to define .
Use the design of runs to construct new, more accurate emulators defined only over for each output in .
Calculate implausibility measures over for each of the outputs in .
Discard points in with to define a smaller non-implausible region .
If the current non-implausible space is sufficiently small, go on to step 7. Otherwise repeat the algorithm from step 1 for wave . The non-implausible space is sufficiently small if it is empty or if the emulator variances are small in comparison to the other sources of uncertainty ( and ), since in this case more accurate emulators would do little to reduce the non-implausible space further.
Generate a large number of acceptable runs from , sampled according to scientific goal.
It should be the case that for all , where for some threshold , where each is calculated using expression (8) with and , that is were we to know the simulator output everywhere. The choice of cutoff is usually motivated by Pukelsheim’s 3-sigma rule . This iterative procedure is powerful as it quickly discards large regions of the input space as implausible based on a small number of well behaved (and hence easy to emulate) outputs. In later waves, outputs that were initially hard to emulate, possibly due to their erratic behaviour in uninteresting parts of the input space, become easier to emulate over the much reduced space . Careful consideration of the initial non-implausible space is important. It should be large enough such that no potentially scientifically interesting input combinations are excluded. A more in-depth discussion of the benefits of this history matching approach, especially in problems requiring the use of emulators, may be found in .
1 Dimensional Example Continued
along with observed error is included as solid and dashed lines respectively. In this example we let model discrepancy be 0, and set the measurement standard error. Along the bottom of the figure is the implausibility for each value represented by colour: red for large implausibility values, orange and yellow for borderline implausibility, and green for low implausibility () .
is the full initial range of , that is . is as shown by the green regions in Figure 2 (left panel). Wave 2, shown in Figure 2 (right panel), involves designing a set of three more runs over , constructing another emulator over this region and calculating implausibility measures for each . This second emulator is considerably more accurate than the observed error, thus , so the analysis can be stopped at this point as extra runs would do little to further reduce the non-implausible space.
Sequential History Matching of Observations
Much scientific insight can be gained from performing a history match. Breaking the data down into subsets and sequentially adding them to the history match allows further scientific insight by revealing how much each subset of experiments informs scientists about the input parameter space of their model, and hence about particular scientific questions. Note that this is different from bringing outputs in sequentially due to emulator capability (step 2 of the algorithm) . We will explore this in detail for the Arabidopsis model.
History Matching or Bayesian MCMC?
In this section we briefly discuss and compare the history matching approach to the standard form of a full Bayesian analysis.
History matching is a highly computationally efficient and practical approach to identifying if a model is consistent with observed data, and, if so, utilising the key uncertainties within the problem to identify where in the input space acceptable matches lie . In doing this, history matching attempts to answer efficiently some of the main questions that a modeller may have. In comparison, a full Bayesian framework requires full probabilistic specification of all uncertainty quantities, but provides a theoretically coherent method to obtain probabilistic answers to most scientific questions. For example in the context of a direct simulator, as given by 6 and 7, a posterior distribution for the location of the true best input is obtained. Problems arise if we do not believe that such a best input
actually exists, since then a posterior over this input has little meaning. Even if we do believe that such a best input exists, making full joint distributional specifications is challenging and frequently leads to approximations being made for mathematical convenience. This in turn also calls into question the meaning of the resulting posterior.
Regardless of how prior distributions have been specified, performing the necessary calculations for a full Bayesian analysis is hard, thus requiring time consuming numerical schemes such as MCMC . A major issue of numerical methods such as MCMC is that of convergence . Many model evaluations are required to thoroughly explore the multi-modal likelihoods over the entire input space. Emulators can facilitate these large numbers of model evaluations at the cost of uncertainty [18, 48]. However, since the likelihood function is constructed from all outputs of interest, we need to be able to emulate with sufficient accuracy all such outputs, including their possibly complex joint behaviour. There may be erratically behaved outputs which are difficult to emulate, leading to emulators with large uncertainty or emulators which fail diagnostics. The likelihood, and hence the posterior, may be highly sensitive to these emulators, and hence be extremely non-robust.
Aside from the above concerns, the posterior of a Bayesian analysis (especially in high-dimensional models) often occupies only a tiny volume of the input space. Obtaining a sufficiently accurate emulator over the whole input space will still require far too many model evaluations, and hence an iterative strategy such as history matching is required. History matching is designed to efficiently cut out the uninteresting parts of the input space, and to then provide highly accurate emulators over the region of interest , where the vast majority of the Bayesian posterior should lie. A detailed Bayesian analysis can then be accurately performed within this much smaller volume of input space, using the accurate final wave emulators, if required. History matching can therefore be thought of as a useful precursor to a full Bayesian analysis, or as a useful analysis to perform in its own right, for modellers who do not wish to make the detailed specifications (or the corresponding robust analysis [49, 50, 51]) that make the full Bayesian approach meaningful. We refer the interested reader to the extended discussion in , comparing history matching, MCMC and also the recently popular ABC.
Application to Arabidopsis Model
We now describe the relevant features of the hormonal crosstalk model as constructed by Liu et al. .
Description and Network
The model represents the hormonal crosstalk of auxin, ethylene and cytokinin of Arabidopsis root development as a set of 18 differential equations, given in Table 1, which must be solved numerically. The model takes an input vector of 45 rate parameters and produces an output vector of 18 chemical concentrations . Note that, for simplicity, we refer to all components of the model including hormones, proteins and mRNA as “chemicals”. Experiments accumulated over many years have established certain relationships between some of the 18 concentrations. For example, either manipulation of the PLS gene or exogenous application of IAA (a form of auxin), cytokinin or ACC (ethylene precursor) affects model outputs , , and . We use initial conditions for the model, given in Table 2, that are consistent with  and .
The hormonal crosstalk network of auxin, cytokinin and ethylene for Arabidopsis root development is shown in Figure 3. The auxin, cytokinin and ethylene signalling modules correspond to the model of . The PIN functioning module is the additional interaction of the PIN proteins introduced in . Solid arrows represent conversions whereas dotted arrows represent regulations. The represent reactions in the biological system and link to the rate parameters on the right hand side of the equations in Table 1. For full details of the model see .
|0.1||0 or 1|
|0.1||0 or 1|
|0||0 or 1|
Mutants and Feeding
We will be interested in comparing the differences in output concentrations under different experimental conditions (wild type (wt), pls mutant, PLS overexpressed transgenic (PLSox), ethylene insensitive etr1, double mutant plsetr1, no feeding, feeding auxin , feeding cytokinin , feeding ethylene , feeding any combination of these hormones , , and ) . Note that for simplicity of terminology, exogenous application of IAA, cytokinin or ACC is referred to as “feeding auxin, cytokinin or ethylene” respectively.
In the model, mutant type is controlled by altering the parameters representing the expression of the two genes PLS and ETR1. Input rate parameter controls the amount to which PLS is suppressed, hence pls is represented by setting and PLSox is represented by increasing the size of to a value greater than that of the wild type plant. Input rate parameter represents the rate of conversion of the active form of the ethylene receptor to its inactive form. The ethylene insensitive etr1 mutant is represented by decreasing the size of to a much smaller value than that of wild type. plsetr1 is represented by both setting and to its much decreased value. Feeding regime is represented by the initial conditions of certain outputs. , and take initial condition values 0 or 1, as indicated in Table 2, depending on whether or not the respective chemical auxin, cytokinin or ethylene has been fed.
Model Structure and The Inputs
Model structure sometimes restricts what we are able to learn about certain parameter relationships. Our history matching techniques would discover such restrictions in the model anyway, but the ability to identify these before we even start will make the process more efficient.
For example, following , we are only interested in comparing the model output to data at equilibrium, thus allowing a substantial dimensional reduction of the input space. At equilibrium, the derivatives on the left hand side of the model equations given in Table 1 will equal zero, and hence the right hand side can be rearranged in terms of one less parameter . For this reason, measurements of outputs of this system will only allow us to learn about certain ratios of the input rate parameters to one another. For example, the equation for becomes
which only depends on the ratio .
Another restriction arises from the fact that the initial conditions for the feeding chemicals , and can only take the values 0 or 1 and then remain constant. This is because, although the expressions , and respectively in the equations for , and take the specific form following the biological mechanism, they can only be learnt about as a whole, essentially comparing the case of a constant reservoir of chemical being available for uptake into the plant with the case of no feeding at all. Feeding of IAA, cytokinin and ACC with any concentration can be rescaled to , , and by adjusting the parameters , , and in each equation respectively. Note that specific equations for the rate of change of the feeding chemicals may allow more insight into the effects of feeding if deemed biologically relevant. Finally, since we consider ranges of rate parameters and rate parameter ratios which are always positive, many spanning many orders of magnitude, we choose to convert to a log scale. We therefore define the reduced 31 dimensional vector of input parameters for the model to be:
as given in Table 3.
In order to perform a full analysis on the model, we introduce a parameter to represent the ratio of the cytosolic volume to the volume of the cell wall . Full details of why we introduce this parameter are included in the supplementary material. For a typical cell, we fixed and considered that a reasonable range of possible values for was for a plant root cell.
Eliciting the Necessary Information for History Matching
To perform a history match, we need to understand how real-world observations relate to model outputs, thus aiding the specification of observed values , model discrepancy terms and measurement error terms . History matching is a versatile technique which can deal with observations of varying quality, such as we have for the Arabidopsis model.
Relating Observations To Model Outputs
Each Arabidopsis model output relating to a biological experiment can be represented by:
Here, the subscript indexes the measurable chemical, indexes the plant type and indexes the feeding action, where indicates no feeding and , and indicate the feeding of auxin, cytokinin and/or ethylene respectively, for a particular set-up of the general model (the Arabidopsis model equations given in Table 1). The vector represents the vector of rate parameter ratios and represents time. There are 200 possible experiments given by the possible combinations of , and .
The average PIN concentration in both the cytosol and the cell wall is calculated as follows:
We collected the results of a subset of 32 of the possible experiments from a variety of experiments in the literature (see [2, 10] and references therein for details). 30 of these observations are measures of the trend of the concentration of a chemical for one experimental condition relative to another experimental condition (usually chosen to be wildtype). We therefore need our outputs of interest to be ratios of the outputs of our model with different experimental subscript settings. We choose to work with log model outputs since these will be more robust and allow multiplicative error statements. Since we only consider model outputs to be meaningful at equilibrium, that is as , we therefore, following , define the main outputs of interest to be:
where the subscript indexes the combinations of that were actually measured. This function will be directly compared to the observed trends. All but one of the trends were relative to wild type with no feeding, with the exception being the ratio of auxin concentration in the pls mutant fed ethylene to the pls mutant without feeding. The remaining two observations are non-ratio wild type measurements of the chemicals and . The outputs of interest for these observations are given as and respectively. Including these experiments within the history match ensures that acceptable matches will not have unrealistic concentrations of auxin and cytokinin.
The full list of 32 outputs is given in the left hand column of Table 4. These are notated in the form
and are assumed to be ratios relative to wild type with no feeding unless otherwise specified. NR indicates that an output is not a ratio. For example, indicates the cytokinin concentration ratio of wild type fed ethylene relative to wild type no feeding, and represents the ethylene concentration ratio of the POLARIS overexpressed mutant relative to wild type.
We sequentially history match the Arabidopsis model to these experimental observations in 3 phases , and , with the group to which each experiment belongs presented in Table 4. We will history match the Dataset observations to obtain a non-implausible set . Additional insight will be gained by further history matching to Dataset to obtain , and then finally history matching to Dataset . Dataset contains the outputs involving the feeding of ethylene. History matching this group separately provides insight into how the inputs of the model are constrained based on physical observations of a plant having been fed ethylene relative to its wild type counterpart. Dataset contains the outputs involving the measurement of , thus demonstrating how useful observing the effects of the POLARIS gene function were for gaining increased understanding about the model and its rate parameters.
|Experiment||Dataset||Minimum Log||Maximum Log||Minimum||Maximum|
|Ratio Value||Ratio Value||Ratio Value||Ratio Value|
Observed Value, Model Discrepancy and Measurement Error
Although some of our collected measurement values were estimated values of a trend or ratio, many of the measurements were only general trend directions or estimated ranges for the ratio value, given with various degrees of accuracy . We therefore use a level of modelling appropriate to the nature of the data to propose order of magnitude estimators for , and that are consistent with the observed trends and expert judgement concerning the accuracy of the model and the relevant experiments. Doing this demonstrates that we can apply our history matching approach to vague, qualitative data, whilst demonstrating the increased power of this analysis were we to have more accurate quantitative data for all the experiments.
A general trend of “Up”, “Down” or “No Change” was collected for 17 of the experiments, these being indicated by an asterix in Table 4. Following the conservative procedure given in , we specify and for the “Up”, “Down” and “No Change” trends respectively, where represents the combined model discrepancy and measurement error, that is . These combined specifications have been chosen such that represents a 20% to ten fold increase for the “Up” trends, a 20% to ten fold decrease for the “Down” trends, and a 40% decrease to 40% increase for the “No Change” trends. To avoid confusion, we here define a 20% decrease to imply that a 20% increase on the decreased value returns the original value. This specification conservatively captures the main features of the trend data, although more in-depth specification could be made if quantitative measurements were available across these outputs. We specify to be in the middle of the logged ratio range. In this work we considered that the deficiencies in the model would be of a similar order of magnitude to the observed errors on the data. We therefore specify both model discrepancy and measurement error to be of equal size and satisfy the ratio intervals above.
For the remaining cases, the observed values , model discrepancies and measurement errors were chosen using a more in-depth expert assessment of the accuracy of the relevant trend measurements and their links to the model (see [2, 10] and references therein for details). Since we will use a maximum implausibility threshold of by appealing to Pukelsheim’s 3 sigma rule  when working with the simulator runs, it is most appropriate to specify the logged ranges of , as these are the ranges which if a simulator run falls outside it will be classed as implausible. These ranges are specified in Table 4 in both logged and not logged form.
Input Ratio Ranges
We choose to work directly with appropriate rate parameter ratios to reduce our parameter space from the 45 in the equations in Table 1 to 30. We then impose a further constraint that , as imposed in  and suggested by the results of , which ensures that the term in the equation is non-negative, and this effectively removes a further input.
We let and represent the values that and respectively should take for wild type. We let the two additional parameters and represent the values these parameters should be multiplied by in order to obtain the corresponding model run for the PLSox and etr1 mutants respectively, that is with and . Doing this allows exploration of a reasonable class of representations of these mutants using independent parameters. We therefore have a reduced parameter space of 31 dimensions as listed in the left hand column of Table 3.
The initial ranges of values for the input parameters were chosen based on those in the literature  and further analysis of the model , and are shown in Table 3. Many of the input ranges were chosen to cover an order of magnitude either side of the single satisfactory input parameter setting found in . Some parameters of particular interest were subsequently increased to allow a wider exploration of the input parameter space. This gave us a large initial input space which was thought to be suitable for our purposes. The logged ratio ranges were all converted to the range prior to analysis.
We now apply the technique of sequential history matching using Bayes linear emulation to the Arabidopsis model . Analysis of the results, after history matching to each of Datasets , and , will involve consideration of the following:
The volume reduction of the non-implausible input space .
Input plots of the non-implausible space .
The variance resolution of individual inputs and groups of inputs.
Output plots of the non-implausible space .
The degree to which each output was informative for learning about each input.
Insights From Initial Simulator Runs
A wave 1 set of 2000 training runs were designed using a maximin Latin hypercube design over the initial input space . Figure 4 shows the wave 1 output runs for all 32 outputs considered. The targets for the history match, as given by the intervals and the ranges in Table 4, are shown as vertical error bars. Black error bars represent Dataset outputs, blue error bars represent Dataset outputs and red error bars represent Dataset outputs.
Figure 4 gives substantial insight into the general behaviour of the model over the initial input space , for example informing us about model outputs that can take extreme values, for example, , and . More importantly, the runs also inform us as to the class of possible observed data sets that the model could have matched, and hence gives insight into the model’s flexibility. There exist outputs with constrained ranges. In particular, many outputs seem to be constrained to being either positive or negative, for example, the logged trend for must be positive and that of must be negative. If such constrained outputs, which are consequences of the biological structure of the model, are found to be consistent with observations, this provides (partial) evidence for the model’s validity. Conversely, we may be concerned about an overly flexible model that was capable of reproducing any combination of positive or negative observed data values for outputs in this dataset. Specifically, we should doubt claims that such a model has been validated by comparison to this data, as it would have inevitably matched any possible data values and hence arguably may not contain much inherent biological structure at all.
There are some outputs for which the majority of the wave 1 runs already go through the corresponding error bars, for example and . This is an indication that these outputs did not help much to constrain the input space. Despite this, none of the wave 1 runs pass through all of the target intervals of the outputs in Dataset simultaneously, thus already suggesting that the volume of the final non-implausible space would be small or indeed zero.
History Matching The Model
We outline the general decisions required to perform the history match. Several packages are available that perform standard Gaussian Process emulation, for example the BACCO  and GPfit  packages in R  or GPy  for Python, which may be used as an alternative to the more sophisticated emulators we describe here. Emulators accurate enough to reduce the size of the non-implausible space at each wave to some degree are sufficient. When constructing emulators, we decided to put more detail into the mean function, but incorporate more complicated structures for the residual process at each wave, thus sequentially increasing the complexity of the emulators at each wave. We provide a summary of the choices made in the history match at each wave in Table 5, including the dataset history matched to (column 2), the number of design runs (column 3), the implausibility cut-off thresholds (columns 4-6) and the emulation strategy (column 7), each of which is discussed in more detail below.
The amount of space that was cut out after each wave is shown in Table 6. We let represent the volume of the non-implausible space after wave , as judged by the emulators, and represent the volume of the space with acceptable matches to the observed data in Dataset , as judged using actual model runs (hence without emulator error). Then columns 2 and 3 give the proportion of the previous wave and initial non-implausible spaces respectively still classed as non-implausible, and columns 5 and 6 give the proportion of the wave and initial non-implausible spaces giving rise to actual acceptable matches to the data in Dataset . The proportion of space cut out at each wave is influential for deciding the number of waves and emulator technique at each wave. In addition, Table 6 presents the radical space reduction obtained by performing the history match. A proportion of of the original space was still considered non-implausible after history matching to Dataset . A proportion of only of the original space was still considered non-implausible after history matching to Datasets and , thus the 5 trends in Dataset , for exogenous application of ACC, facilitated an additional reduction of 3 orders of magnitude. After all experimental observations had been matched to, the non-implausible space had been reduced to a proportion of of the original space, thus the 5 trends in Dataset , for measurement of POLARIS gene expression, refocused the set by another 2 orders of magnitude. Such small proportions of the original space being classed as non-implausible means that acceptable runs within these spaces would likely be missed by more ad-hoc parameter searching methods of analysis.
|Wave ()||Dataset ()||Runs||Emulation Strategy|
|3, 4||2000||3||2.8||-||Linear models|
|5||2000||3||2.8||-||Single fixed correlation length|
|6, 7||2000||3||2.8||-||Several correlation lengths per output|
|8, 9||2000||3||2.9||-||Linear models for Dataset B outputs only|
|10||2000||3||2.9||-||Single fixed correlation length|
|11||3500||3||2.9||-||Several correlation lengths per output|
|12||2000||3||2.9||-||Single fixed correlation length|
|13||3500||3||2.9||-||Several correlation lengths per output|
|Wave ()||Dataset ()|
Linear model emulators with uncorrelated residual processes were used in the initial waves since they are very cheap to evaluate, substantially more so even than emulators involving a correlated residual process, which may only be slightly more accurate . As the amount of space being classed as implausible at each wave started to drop, we introduced emulators with a Gaussian correlation residual process. Methods in the literature for picking the correlation lengths tend to be computationally intensive and their result highly sensitive to the sample of simulator runs [56, 57]. The choice was therefore made to use a single correlation length parameter value of for all input-output combinations, this choice being checked using emulator diagnostics.
At wave 6 the complexity of the residual process was increased still further by splitting the active inputs for each output emulator into five groups based on similar strength of effect, and using maximum likelihood to fit the same correlation length to all inputs in each group. This extension to the literature of fitting several different correlation length parameter values strikes a balance between the stability of the maximum likelihood process (which can become very challenging were we to include 32 separate correlation lengths for each of the 32 inputs) and the overall complexity of the residual process. At wave 8 we introduced the Dataset outputs by first using linear model emulators for the new outputs only, and then using emulators with residual correlation processes for all outputs. In waves 12 and 13 we incorporated emulators with residual processes for the Dataset outputs.
The number of design points per wave was largely kept constant at 2000. 2000 was deemed a suitable number of runs per wave as it meant that the matrix calculations involved in the emulator were reasonable, whilst allowing adequate coverage of the non-implausible input space with simulator runs. At waves 11 and 13, 3500 design runs were used to build more accurate emulators.
In terms of design, a maximin Latin hypercube [29, 30] was deemed sufficient for our needs as we required a simple and efficient space-filling design. The speed of the simulator meant that more structured and tactical designs were unnecessary for our requirements. At wave 1 we constructed a Latin hypercube of size 2000 to build emulators for each of the outputs. At waves 2-7 we first built a large maximin Latin hypercube design containing a large number of points over the smallest hyper-rectangle enclosing the non-implausible set. We then used all previous wave emulators and implausibility measures to evaluate the implausibility of all the proposed points in the design 
. Any points that did not satisfy the implausibility cut-offs were discarded from further analysis. If a single Latin hypercube was not sufficient to generate enough design points, multiple Latin hypercubes were taken in turn and the remaining points in each were taken to be the design. From wave 8 onwards an alternative sampling scheme was necessary to generate approximately uniform points from the non-implausible sets, since generating points using Latin hypercubes became infeasible due to the size of the non-implausible space. There are several alternative ways presented in the literature to approximately sample uniformly distributed points over the non-implausible space[27, 36, 58]. We used a simple Metropolis-Hastings MCMC algorithm , which provided adequate coverage of the non-implausible space.
At each wave we performed diagnostics on the mean function linear model, the emulator and the implausibility criteria using 200 points in the non-implausible space. The diagnostic test for implausibility which we used was the one described in . It compares the data implausibility cut-off criteria , that is the implausibility evaluated at a known diagnostic run, against the chosen implausibility cut-off criteria for the emulator outputs.
Many waves were necessary to complete this history matching procedure due to the complex structure of the Arabidopsis model. We see from the first few waves that linear model emulators are sufficient for learning a great deal about the input parameter space, but that including full emulators with correlated residuals at later waves can be useful. At the end of the procedure we obtained many runs satisfying each of the datasets and . We now go on to describe the results of the parameter search using various graphical representation techniques and discuss their biological implications.
Visual Representation of History Matching Results
A major aim of this work is to evaluate the dependence of the parameter space of the model on experimental measurements.
Figure 5 shows, below the diagonal, a “pairs” plot for a subset of the inputs . A “pairs” plot shows the location of various points in the 31-dimensional input space projected down into 2-dimensional spaces corresponding to two of the inputs. For example, the bottom left panel shows the points projected onto the vs plane. Inputs to wave 1 runs are given by grey points. Inputs to runs of the simulator with acceptable matches to the observed data in Datasets , and are given as yellow, pink and green points respectively. Above the diagonal are shown 2-dimensional optical density plots of inputs to runs with acceptable matches to all of the observed data for the same subset of the inputs. Optical density plots show the depth or thickness of the non-implausible space in the remaining 29 dimensions not shown in the 2d projection (see [14, 17] for details). The orientation of these plots has been flipped to be consistent with the plots below the diagonal. Along the diagonal are shown 1-dimensional optical density plots.
Figure 5 provides much insight into the structure of the model and the constraints placed upon the input rate parameters by the data. Some of the inputs, such as , , and are constrained even in terms of 1 -dimensional range. Some inputs only appear constrained when considered in combination with other inputs, for example and exhibit a positive correlation. This is reasonable, since an increase in , the rate constant for converting the activated form of ethylene receptor into its inactivated form, can be compensated by an increase in , the rate constant for removing ethylene, since ethylene promotes the conversion of the activated form of ethylene receptor into its inactivated form. More complex constraints involving three or more inputs are more difficult to visualise. Below the diagonal, the pairs plot gives insight into which input parameters were learnt about by which set of outputs. For example, the parameter is largely learnt about by Dataset , as is clear from the difference between the area of the yellow points and pink points in plots involving this input. This is not surprising, since this term corresponds to the feeding and biosynthesis () of ethylene, which we would expect to be learnt from the feeding ethylene experiments. We can see that input combinations with large values of are classed as implausible, thus constraining this input to be relatively low.
Figure 6 shows the output runs corresponding to the input combinations shown in Figure 5 for all 32 outputs considered. The colour scheme is directly consistent with Figure 5, with wave 1 runs given as grey lines, and simulator non-implausible runs after history matching Datasets , and given as yellow, pink and green lines respectively. Runs which pass within the error bar of a particular output satisfy the constraint of being within , thus being in alignment with the results of the corresponding experimental observation, given our beliefs about model discrepancy and measurement error. Black error bars represent Dataset outputs, blue error bars represent Dataset outputs and red error bars represent Dataset outputs.
Figure 6 gives much insight into joint constraints on possible model output values that are in alignment with all of the observed data (and so would pass through all of the error bars). Some model outputs have been constrained much more than the range of their error bars, for example, is constrained to the upper half of its error bar while is constrained to take smaller values. It is interesting that many of the yellow runs already go through the error bars of some of the outputs in Datasets and , for example