1 Introduction
A patient’s health and safety is an important concern, and healthcare is a heavily regulated industry. In the USA and the EU, private and public regulatory bodies exist on federal/union and state levels (Field, 2008; Hervey, 2010). Healthcare companies applying to market a new drug or medical device must submit extensive technical information to the regulatory bodies. In the USA and the EU, the Food & Drug Administration (FDA) and European Medicines Agency (EMA) handle these applications, respectively. The FDA require that the technical information contains e.g. the chemical composition of the drug, how the drug affects the human body (pharmacodynamics), how the body affects the drug (pharmacokinetics), and methods for drug manufacturing, packaging and quality control (U.S. Food & Drug Administration, 2018). Likewise, applying to market a new medical device requires submitting extensive technical information. A medical device can be any appliance, software or material used for nonpharmacological diagnosis, monitoring or treatment of a disease or condition (European Union, 2017). An important part of proving the efficacy and safety of a new drug or medical device is showing that its effects on the patient can be predicted and interpreted, e.g. via mathematical and computational models.
These transparency and interpretability requirements, combined with limited amounts of available experimental data, make models learnt solely from observed data unsuitable for proving the efficacy and safety of a new drug or medical device to the regulatory bodies. Hence researchers use explicit parametric models. For drugs, the pharmacokinetics (how the human body affects the drug) is often modelled using systems of ordinary differential equations. These models elucidate how the drug is absorbed and distributed through the body (e.g. brain, kidneys and liver) under different dosing profiles and methods of drug administration (e.g. orally or intravenously) (see Figure
1). The EMA pharmacokinetic model guidelines state that regulatory submissions should include detailed descriptions of the models, i.e. justification of model assumptions, parameters and their biochemical plausibility, as well as parameter sensitivity analyses (European Medicines Agency, 2011, p. 16; 2016, p. 3).A key problem is to identify a mathematical model that usefully explains and predicts the behaviour of the pharmacokinetic process (Heller et al., 2018). Researchers will often have several models (i.e. hypotheses) about an underlying mechanism of the process, but insufficient experimental data to discriminate between the models (Heller et al., 2018). Additional experiments are generally expensive and timeconsuming. Pharmacokinetic experiments typically take 8–24 hours for mice and rats, and 12–100 hours for simians and humans (Ogungbenro & Aarons, 2008; Tuntland et al., 2014). It is key to design experiments yielding maximally informative outcomes with respect to model discrimination.
Contribution
We tackle limitations in existing analytical and datadriven approaches to design of experiments for model discrimination by bridging the gap between the classical analytical methods and the Monte Carlobased approaches. These limitations include the classical analytical methods’ difficulty to manage nonanalytical model functions, and the computational cost for the datadriven approaches. To this end, we develop a novel method of replacing the original parametric models with probabilistic, nonparametric Gaussian process (GP) surrogates learnt from model evaluations. The GP surrogates are flexible regression tools that allow us to extend classical analytical results to nonanalytical models in a computationally efficient manner, while providing us with confidence bounds on the model predictions.
The method described in this paper has been implemented in a python software package, GPdoemd (2018), publicly available via GitHub under an MIT License.
2 Model Discrimination
Model discrimination aims to discard inaccurate models, i.e., hypotheses that are not supported by the data. Assume models are given, with corresponding model functions . The model is a hypothesis and collection of assumptions about a biological system, e.g. the pharmacokinetic model in the top left of Figure 1, and the function is the model’s mathematical formulation that can be used for predictions, e.g. the drug concentrations on the right in Figure 1. Each model function takes as inputs the design variables and model parameters . The design variables , e.g. drug administration and dosing profile, specify a physical experiment that can be run on the system described by the models, with defining the operating system boundaries. The classical setting allows tuning of the model parameters to fit the model to data. The system has target dimensions, i.e. . We denote the function for each target dimension as . This notation distinguishes between the target dimension of a model function , and model ’s function . Table A in the supplementary material summarises the notation used.
We start with an initial set of experimental measurements , . A common assumption in pharmacokinetics is that the measurement noise term
is i.i.d. zeromean Gaussian distributed with covariance
(Steimer et al., 1984; Ette & Williams, 2004; Tsimring, 2014). Skew in the data distribution can be handled, e.g. through
transformation. The experimental data set is denoted . The initial experimental data points are insufficient to discriminate between the models, i.e. . We are agnostic to the available experimental data, and wish to incorporate all of it in the model discrimination.2.1 Classical Analytical Approach
Methods for tackling the design of experiments for discriminating simple, analytical models have been around for over 50 years. Box & Hill (1967) study the expected change in Shannon entropy
from making an additional experimental observation, where the posterior model probability
and . Box & Hill (1967) develop a design criterion by maximising the upper bound on . The expression for can be found in the supplementary material. MacKay (1992a, b) derives a similar expression. Box & Hill (1967) choose the next experiment by finding . Design of experiments continues until there exists a model probability .BuzziFerraris et al. (1990) propose a new design criterion (see the supplementary material) from a frequentist pointofview. They suggest using a test with degrees of freedom for the model discrimination. The null hypothesis for each model is that the model has generated the experimental data. Design of experiments continues until only one model passes the test. Models are not ranked against each other since BuzziFerraris et al. (1990) argue this simply leads to the least inaccurate model being selected. The procedure is more robust against—but not immune to—favouring the least inaccurate model.
Michalik et al. (2010) proceed from the Akaike information criterion (AIC) as the model discrimination criterion to derive a design criterion from the Akaike weights . The expression for can be found in the supplementary material. Design of experiments continues until there exists an Akaike weight . Box & Hill (1967) and Michalik et al. (2010) implicitly favour the least inaccurate model.
In order to account for the uncertainty in the model parameters , the classical analytical approach (Prasad & Someswara Rao, 1977; BuzziFerraris et al., 1984) is to approximate the model parameters as being Gaussian distributed around the bestfit parameter values . The covariance is given by a firstorder approximation
(1) 
where , , and the gradient . The predictive distribution with approximately marginalised out becomes , where the covariance is .
Limitations
Most model functions for healthcare applications such as pharmacokinetics or medical devices are not analytical. They are often, from a practical pointofview, complex black boxes of legacy code, e.g. large systems of ordinary or partial differential equations. We can evaluate the model function
, but the gradients are not readily available. Specialist software can retrieve the gradients from systems of ordinary or partial differential equations, but this requires implementing and maintaining the models in said specialist software packages. Other types of mathematical models may require solving an optimisation problem (Boukouvala et al., 2016). For model discrimination, we wish to be agnostic with regards to the software implementation or model type, since this flexibly (i) allows faster model prototyping and development, and (ii) satisfies the personal preferences of researchers and engineers.2.2 Bayesian Design of Experiments
Methods for accommodating nonanalytical model functions have developed in parallel with increasing computer speed. These methods are typically closer to fully Bayesian than the classical analytical methods. Vanlier et al. (2014) approximate the marginal predictive distributions of
models and their JensenShannon divergence using Markov chain Monte Carlo (MCMC) sampling and a
nearest neighbours density estimate. On the downside, the density estimates become less accurate as the number of experimental observations increases
(Vanlier et al., 2014) and the method is computationally intensive (Ryan et al., 2016).Instead of studying design of experiments for model discrimination, statisticians have focused on design of experiments and model discrimination separately (Chaloner & Verdinelli, 1995). Design of experiments can also be used to aid model parameter estimation. They solve problems of the type
(2) 
where the utility function serves to either discriminate models or estimate parameters. Criteria for model discrimination are handled separately, usually under the name of model selection or hypothesis testing.
Ryan et al. (2015) use a Laplace approximation of the posterior distribution combined with importance sampling. Drovandi et al. (2014) develop a method based on sequential Monte Carlo (SMC) that is faster than using MCMC. Woods et al. (2017) use a Monte Carlo approximation with on which they place a Gaussian process prior.
Limitations
These methods agnostically accommodate nonanalytical models using a Monte Carloapproach but require exhaustive model sampling in the model parameter space. On a case study with four models, each with ten model parameters, and two design variables, the Vanlier et al. (2014) MCMC method requires six days of wallclock time to compute two sets of four marginal predictive distributions. This cost is impractical for pharmacokinetic applications, where a physical experiment takes 1–4 days. SMC methods can converge faster than MCMC methods (Woods et al., 2017). But SMC methods can suffer from sample degeneracy, where only a few particles receive the vast majority of the probability weight (Li et al., 2014). Also, convergence analysis for MCMC and SMC methods is difficult.
Furthermore, these methods are only for design of experiments. Once an experiment is executed, the model discrimination issue remains. In this case the marginal predictive distributions would enable calculating the model likelihoods.
3 Method
We wish to exploit results from the classical analytical approach to design of experiments for model discrimination, while considering that we may deal with nonanalytical model functions. The key idea is replacing the original (nonanalytical) model functions with analytical and differentiable surrogate models. The surrogate models can be learnt from model evaluations. We choose to use GPs to construct the surrogate models because they are flexible regression tools that allow us to extract gradient information.
A GP is a collection of random variables, any finite subset of which is jointly Gaussian distributed
(Rasmussen & Williams, 2006, p. 13) and fully specified by a mean function and a kernel function . Prior knowledge about the model functions can be incorporated into the choice of and . To construct the GP surrogates, target data is needed. The purpose of the surrogate model is to predict the outcome of a model evaluation , rather than the outcome of a physical experiment . Hence, for each model we create a data set of simulated observations, i.e. model evaluations , where and are sampled from and , respectively. Note the difference between the experimental data set and the simulated data sets : whereas the data set has few (noisy, biological) observations, has many (noisefree, artificial) observations.We place independent GP priors on each target dimension of the model function , . For notational simplicity, let , , and . Given with model evaluations at locations the predictive distribution at a test point is given by
(3)  
(4) 
where the covariance vector is
, the kernel matrix is , and the target mean is, with a small artificial noise variance
added for numerical stability. Note the simplified notation and .Pharmacokinetic models are often formulated as systems of ordinary differential equations, e.g. Han et al. (2006)
, where the solutions are smooth functions. If we wish to model these functions with a GP, it makes sense to use a covariance function that encodes the smoothness of the underlying function, e.g. the radial basis function (RBF) covariance function
, since it corresponds to Bayesian linear regression with an infinite number of basis functions
(Rasmussen & Williams, 2006). Our method extends to other choices of covariance function.Each function sample drawn from a zeromean GP is a linear combination of covariance function terms (Rasmussen & Williams, 2006, p. 17). Different model characteristics, e.g. smoothness or periodicity, make alternate covariance functions appropriate. The RBF covariance function is common, but its strong smoothness prior may be too unrealistic for some systems. Other choices include: Matérn, polynomial, periodic, and nonstationary covariance functions. Sums or products of valid covariance functions are still valid covariance functions. Covariance functions can also model different behaviour over different input regions (Ambrogioni & Maris, 2016). Choosing the covariance function requires some prior knowledge about the modelled function characteristics, e.g. a periodic covariance function may be appropriate to model periodic drug dosing.
In , and other similar covariance functions, is the signal variance and is a diagonal matrix of squared length scales, with . Together, , and the artificial noise variance (in
) are the GP hyperparameters. Because the data set
is arbitrarily large, we use pointestimate hyperparameters learnt through standard evidence maximisation.We study a single model function and its GP surrogates. Using the GP surrogates we can predict the outcome of a model evaluation , where and . We approximate the (marginal) predictive distribution to account for the model parameter uncertainty due to parameter estimation from the noisy experimental data .
3.1 Model Parameter Marginalisation
We assume that the experimental observations have i.i.d. zeromean Gaussian distributed measurement noise with known covariance . Let denote the gradient of a function with respect to the model parameter , evaluated at , and denote the corresponding Hessian.
We approximate the model parameter posterior distribution as a Gaussian distribution around the bestfit model parameter values . Using the same firstorder approximation as the classical analytical method in (1), with , yields
where , with . We approximate the marginal predictive distribution using a first or secondorder Taylor expansion around , where
(5)  
(6) 
which we will detail in the following.
Firstorder Taylor approximation
assumes that the model function is (approximately) linear in the model parameters in some neighbourhood around
(7)  
(8) 
Inserting (7) into (5) yields the approximate mean of the marginal predictive distribution
(9) 
Similarly, inserting (8) into (6), the approximate expectation of the variance becomes
(10) 
and the variance of the mean becomes
(11) 
(9)–(11) yield a tractable firstorder Taylor approximation to .
Secondorder Taylor expansion
assumes that the model function is (approximately) quadratic in the model parameters in some neighbourhood around
(12)  
(13) 
Inserting (12) into (5) yields the approximate mean of the marginal predictive distribution
(14) 
Similarly, inserting (13) into (6) yields the diagonal approximate expectation of the variance with elements
(15) 
The variance of the mean is a dense matrix with elements
(16) 
(14)–(16) yield a tractable secondorder Taylor approximation to .
Figure 2 illustrates the process of constructing the GP surrogate models and the approximations of the marginal predictive distributions. Firstly, the model parameters are estimated from . Secondly, is created from model evaluations and used (i) to learn the GP surrogate’s hyperparameters, and (ii) as target data for GP predictions. Thirdly, the approximation of the marginal predictive distribution is computed. Fourthly and lastly, the marginal predictive distribution is used to design the next experiment and approximate the model likelihood.
4 Results
The following demonstrates that our approach for model discrimination with GP surrogate models exhibits a similar performance to the classical analytical method, but can be used to design experiments to discriminate between nonanalytical model functions. The GP surrogate method performance will be studied using two different case studies.
Case study 1 has analytical model functions, so we can compare how well our new method performs compared to the classical analytical methods in Section 2.1. Case study 2 has pharmacokineticlike models consisting of systems of ordinary differential equations.
In both case studies we compute the following metrics:
 (A)

the average number of additional experiments required for all incorrect models to be discarded.
 (SE)

the standard error of the average (A).
 (S)

success rate; the proportion of tests in which all incorrect models were discarded.
 (F)

failure rate; the proportion of tests in which the correct (datagenerating) model was discarded.
 (I)

the proportion of inconclusive tests (all models were deemed inaccurate or the maximum number of additional experiments was reached).
We compare the design criteria (DC) , and described in Section 2.1 as well as random uniform sampling (denoted Uni.). We also compare the three different criteria for model discrimination (MD) described in Section 2.1: updated posterior likelihood , the adequacy test, and the Akaike information criterion (AIC).
4.1 Case Study 1: Comparison to Analytical Method
Case study 1 is from the seminal paper by BuzziFerraris et al. (1984). With two measured states , and two design variables , we discriminate between four chemical kinetic models , each with four model parameters that we fix to :
where and . BuzziFerraris et al. (1984) generate the experimental data from model using and and Gaussian noise covariance . We start each test with randomly sampled experimental observations, and set a maximum budget of additional experiments.
Tables 3–3 compare the performance in 500 completed test instances of the classical analytical approach to the GP surrogate method using first or secondorder Taylor approximations. For this case study, the GP surrogate method performs similarly to the classical analytical approach. We see that the average number of required additional experiments (A) increases using for model discrimination, but that the model discrimination success rate (S) also increases. The model discrimination failure rate drops for almost all MD/DC combinations using the GP surrogate method. We also see that even for this simple case study, the difference in performance between the design criteria and random sampling is significant. The low average (A) for /Uni. in Table 3 is likely an artefact due to the large failure rate (F).
MD  AIC  

DC  
A  2.60  2.87  2.08  3.43  10.39 
SE  0.04  0.12  0.04  0.11  0.45 
S [%]  86.4  64.2  62.4  59.0  85.8 
F [%]  13.6  5.0  37.6  40.6  4.4 
I [%]  0.0  30.8  0.0  0.4  9.8 
MD  AIC  

DC  
A  4.31  2.23  2.72  10.99  10.02 
SE  0.09  0.06  0.08  0.34  0.45 
S [%]  95.6  47.4  88.6  69.6  83.6 
F [%]  4.4  4.8  11.4  30.4  4.8 
I [%]  0.0  47.8  0.0  0.0  11.6 
MD  AIC  

DC  
A  4.14  2.29  2.64  9.53  10.11 
SE  0.09  0.07  0.06  0.30  0.47 
S [%]  96.9  46.6  90.1  64.3  80.2 
F [%]  3.1  9.9  9.9  35.7  7.0 
I [%]  0.0  43.5  0.0  0.0  12.8 
4.2 Case Study 2: NonAnalytical Models
Case study 2 is from Vanlier et al. (2014). There are four different models, described by ordinary differential equations. The supplementary material contains the full model descriptions. The models describe a biochemical network, and are similar to typical pharmacokinetic models in (i) the use of reversible reactions, and (ii) the number of parameters and equations (see e.g. Han et al. (2006)).
We assume two measured states (), with model parameters , initial concentrations , measurement time point , and design variables . We follow Vanlier et al. (2014) and let generate the observed experimental data, with random uniformly sampled true model parameters and experimental measurement noise covariance . Each test is initiated with observations at random design locations . We set a maximum budget of additional experiments.
Case study 1 shows no obvious benefit in using the secondorder over a firstorder Taylor approximation. Hence, in case study 2 we only performance test the GP surrogate method with the firstorder Taylor approximation.
The results in Table 4 for 4 models show that the rates of inconclusive results are higher for case study 2 than for case study 1.
4 models  3 models  
MD  AIC  AIC  
DC  
A  20.10  39.83  29.62  15.80  21.91  9.74 
SE  3.72  12.09  7.72  2.05  2.52  1.70 
S [%]  15.9  9.5  33.3  89.5  77.2  95.6 
F [%]  7.9  0.0  7.9  6.1  0.9  1.8 
I [%]  76.2  90.5  58.7  4.4  21.9  2.6 
Furthermore, the ratio of successful model discriminations to failed model discriminations are lower for / and AIC/ than they are in case study 1. This is because for case study 2 the experimental noise is often larger than the difference between the model predictions. Figure 3 shows the evolutions of the Akaike weights (Michalik et al., 2010) for all 63 completed tests. The Akaike weight evolutions show that especially models and are difficult to discriminate between, given that we only consider two measured states. Corresponding graphs for and are similarlooking. One can also see from the model formulations (see supplementary material) that all models are similar.
To verify that the Table 4 results for 4 models are indeed due to the indiscriminability of model and , we carry out another set of tests where we only consider models , and . The results in Table 4 for 3 models show that removing model enables the GP surrogate method to correctly discriminate between the remaining models more successfully. In practice, researchers faced with apparent model indiscriminability have to rethink their experimental setup to lower measurement noise or add measured states.
5 Discussion
The healthcare industry already uses machine learning to help discover new drugs (Pfizer, 2016). But passing the preclinical stage is expensive. DiMasi et al. (2016) estimate the average preapproval R&D cost to $2.56B (in 2013 U.S. dollars) with a yearly increase of 8.5 %. For the preclinical stage (before human testing starts), DiMasi et al. (2016) estimate the R&D cost to $1.1B.
Model discrimination, which takes place in the preclinical stage, may lower drug development cost by identifying a model structure quickly. Heller et al. (2018) explain that a major hurdle for in silico pharmacokinetics is the difficulty of model discrimination. Scannell & Bosley (2016) and Plenge (2016) also attribute a significant part of the total R&D cost for new drugs to inaccurate models passing the preclinical stage only to be discarded in later clinical stages. Inaccurate models need to be discarded sooner rather than later to avoid expensive failures.
Design of experiments for model discrimination has typically focussed either on analytical approaches, which cannot manage all functions, or on datadriven approaches, which may have computational difficulties or lack interpretable marginal predictive distributions. We leverage advantages of both approaches by hybridising the analytical and datadriven approaches, i.e. we replace the analytical functions from the classical methods with GP surrogates.
The novel GP surrogate method is highly effective. Case study 1 directly compares it with the analytical approach on a classical test instance. The similar performance indicates successful hybridisation between analytical and datadriven approaches. Case study 2 considers an industrially relevant problem suffering from model indiscriminability. The GP surrogate method successfully avoids introducing overconfidence into the model discrimination despite the assumptions used to derive the marginal predictive distribution.
The parameter estimation step is limited to few and noisy data points, e.g. from timeconsuming pharmacokinetic experiments. If the approximation is not accurate, the Section 3.1 Taylor approximations of and can be extended to a mixture of Gaussian distributions , with , if are given. The analytical approximations sacrifice some accuracy for computational tractability: Our computational time for design of experiments in case study 2 was on the order of minutes, whereas the method of Vanlier et al. (2014) requires days. Our results indicate that the approximations work well in practice. GP surrogate methods could sharply reduce the cost of drug development’s preclinical stage, e.g. by shrinking the time needed to decide on a new experiment or by finding a good model structure in a reduced number of pharmacokinetic experiments.
Our new method directly applies to neighbouring fields. For example, dual control theory deals with control of unknown systems, where the goal is to balance controlling the system optimally vs. learning more about the system. When rival models predict the unknown system behaviour, learning about the system involves designing new experiments, i.e. control inputs to the system (Cheong & Manchester, 2014). This design of experiments problem, e.g. in Larsson et al. (2013), would be likely to have very strict time requirements and benefit from confidence bounds on the model predictions. Our method also applies to neighbouring application areas, e.g. distinguishing between microalgae metabolism models (Ulmasov et al., 2016) or models of bone neotissue growth (Mehrian et al., 2018).
A very hot topic in machine learning research is learning in implicit generative models (Mohamed & Lakshminarayanan, 2017), e.g. generative adversarial networks (GANs) (Goodfellow et al., 2014), which employ a form of model discrimination. Here the likelihood function is not given explicitly, so the goal is to learn an approximation of the data distribution . The model discrimination tries to identify from which distribution or a random sample has been drawn. For the case of design of experiments for model discrimination, we have a prescribed probabilistic model, since we explicitly specify an approximation of the likelihood . But we have (i) the physical experiments, the true but noisy data generator that is expensive to sample, and (ii) the rival models, deterministic and relatively cheap (but probably inaccurate!) data generators. Design of experiments for model discrimination also ties in with several other wellknown problems in machine learning, e.g. density estimation, distribution divergence estimation, and model likelihood estimation.
6 Conclusions
We have presented a novel method of performing design of experiments for model discrimination. The method hybridises analytical and datadriven methods in order to exploit classical results while remaining agnostic to the model contents. Our method of approximating the marginal predictive distribution performs similarly to classical methods in a case study with simple analytical models. For an industriallyrelevant system of ordinary differential equations, we show that our method both successfully discriminates models and avoids overconfidence in the face of model indiscriminability. The tradeoff between computational complexity and accuracy that we introduce is useful for design of experiments for discrimination e.g. between pharmacokinetic models, where the duration of the experiments is shorter than or one the same timescale as the computational time for Monte Carlobased methods.
Acknowledgements
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement no.675251, and an EPSRC Research Fellowship to R.M. (EP/P016871/1).
References
 Ambrogioni & Maris (2016) Ambrogioni, L. and Maris, E. Analysis of nonstationary time series using locally coupled gaussian processes. ArXiv eprints, 2016.
 Boukouvala et al. (2016) Boukouvala, F., Misener, R., and Floudas, C. A. Global optimization advances in mixedinteger nonlinear programming, MINLP, and constrained derivativefree optimization, CDFO. Eur J Oper Res, 252(3):701 – 727, 2016.
 Box & Hill (1967) Box, G. E. P. and Hill, W. J. Discrimination among mechanistic models. Technometrics, 9(1):57–71, 1967.
 BuzziFerraris et al. (1984) BuzziFerraris, G., Forzatti, P., Emig, G., and Hofmann, H. Sequential experimental design for model discrimination in the case of multiple responses. Chem Eng Sci, 39(1):81–85, 1984.
 BuzziFerraris et al. (1990) BuzziFerraris, G., Forzatti, P., and Canu, P. An improved version of a sequential design criterion for discriminating among rival multiresponse models. Chem Eng Sci, 45(2):477–481, 1990.
 Chaloner & Verdinelli (1995) Chaloner, K. and Verdinelli, I. Bayesian experimental design: A review. Statist Sci, 10(3):273–304, 1995.
 Cheong & Manchester (2014) Cheong, S. and Manchester, I. R. Model predictive control combined with model discrimination and fault detection. In IFAC 19, 2014.
 DiMasi et al. (2016) DiMasi, J. A., Grabowski, H. G., and Hansen, R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ, 47:20–33, 2016.
 Drovandi et al. (2014) Drovandi, C. C., McGree, J. M., and Pettitt, A. N. A sequential Monte Carlo algorithm to incorporate model uncertainty in Bayesian sequential design. J Comput Graph Stat, 23(1):3–24, 2014.
 Ette & Williams (2004) Ette, E. I. and Williams, P. J. Population pharmacokinetics i: Background, concepts, and models. Ann Pharmacother, 38:1702–1706, 2004.
 European Medicines Agency (2011) European Medicines Agency. Guideline on the use of pharmacogenetic methodologies in the pharmacokinetic evaluation of medicinal products, December 2011.
 European Medicines Agency (2016) European Medicines Agency. Guideline on the qualification and reporting of physiologically based pharmacokinetic (PBPK) modelling and simulation, July 2016.
 European Union (2017) European Union. Regulation (EU) 2017/745, April 2017.
 Field (2008) Field, R. I. Why is health care regulation so complex? Pharm Ther, 33(10):607–608, 2008.
 Goodfellow et al. (2014) Goodfellow, I. J., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In NIPS 27, 2014.
 GPdoemd (2018) GPdoemd. Github repository. https://github.com/cogimperial/GPdoemd, 2018.
 Han et al. (2006) Han, J.J., Plawecki, M. H., Doerschuk, P. C., Ramchandani, V. A., and O’Connor, S. Ordinary differential equation models for ethanol pharmacokinetic based on anatomy and physiology. In EMBS 28, 2006.
 Heller et al. (2018) Heller, A. A., Lockwood, S. Y., Janes, T. M., and Spence, D. M. Technologies for measuring pharmacokinetic profiles. Annu Rev Anal Chem, 11(1), 2018.
 Hervey (2010) Hervey, T. The impacts of European Union law on the health care sector: Institutional overview. Eurohealth, 16(4):5–7, 2010.
 Larsson et al. (2013) Larsson, C. A., Annergren, M., Hjalmarsson, H., Rojas, C. R., Bombois, X., Mesbah, A., and Modén, P. E. Model predictive control with integrated experiment design for output error systems. In ECC 2013, pp. 3790–3795, 2013.
 Li et al. (2014) Li, T., Sun, S., Sattar, T. P., and Corchado, J. M. Fight sample degeneracy and impoverishment in particle filters: A review of intelligent approaches. Expert Syst Appl, 41(8):3944–3954, 2014.

MacKay (1992a)
MacKay, D. J. C.
Bayesian interpolation.
Neural Comput, 4:415–447, 1992a.  MacKay (1992b) MacKay, D. J. C. Informationbased objective functions for active data selection. Neural Comput, 4:590–604, 1992b.
 Mehrian et al. (2018) Mehrian, M., Guyot, Y., Papantoniou, I., Olofsson, S., Sonnaert, M., Misener, R., and Geris, L. Maximizing neotissue growth kinetics in a perfusion bioreactor: An in silico strategy using model reduction and Bayesian optimization. Biotechnol Bioeng, 115(3):617–629, 2018.
 Michalik et al. (2010) Michalik, C., Stuckert, M., and Marquardt, W. Optimal experimental design for discriminating numerous model candidates: The AWDC criterion. Ind Eng Chem Res, 49:913–919, 2010.
 Mohamed & Lakshminarayanan (2017) Mohamed, S. and Lakshminarayanan, B. Learning in implicit generative models. ArXiv eprints, 2017.

Ogungbenro & Aarons (2008)
Ogungbenro, K. and Aarons, L.
How many subjects are necessary for population pharmacokinetic experiments? Confidence interval approach.
Eur J Clin Pharmacol, 64(7):705, 2008.  Pfizer (2016) Pfizer. IBM and Pfizer to accelerate immunooncology research with Watson for drug discovery, 1 December 2016. Press release.
 Plenge (2016) Plenge, R. M. Disciplined approach to drug discovery and early development. Sci Transl Med, 8(349):349ps15, 2016.
 Prasad & Someswara Rao (1977) Prasad, K. B. S. and Someswara Rao, M. Use of expected likelihood in sequential model discrimination in multiresponse systems. Chem Eng Sci, 32:1411–1418, 1977.
 Rasmussen & Williams (2006) Rasmussen, C. E. and Williams, C. K. I. Gaussian processes for machine learning. MIT Press, 2006.
 Ryan et al. (2015) Ryan, E. G., Drovandi, C. C., and Pettitt, A. N. Fully Bayesian experimental design for pharmacokinetic studies. Entropy, 17:1063–1089, 2015.
 Ryan et al. (2016) Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. A review of modern computational algorithms for Bayesian optimal design. Int Stat Rev, 84:128–154, 2016.
 Scannell & Bosley (2016) Scannell, J. W. and Bosley, J. When quality beats quantity: Decision theory, drug discovery, and the reproducibility crisis. PLOS ONE, 11(2):1–21, 2016.
 Steimer et al. (1984) Steimer, J.L., Mallet, A., Golmard, J.L., and Boisvieux, J.F. Alternative approaches to estimation of population pharmacokinetic parameters: Comparison with the nonlinear mixedeffect model. Drug Metab Rev, 15(1–2):265–292, 1984.
 Tsimring (2014) Tsimring, L. S. Noise in biology. Rep Prog Phys, 77(2):026601, 2014.
 Tuntland et al. (2014) Tuntland, T., Ethell, B., Kosaka, T., Blasco, F., Zang, R. X., Jain, M., Gould, T., and Hoffmaster, K. Implementation of pharmacokinetic and pharmacodynamic strategies in early research phases of drug discovery and development at Novartis Institute of Biomedical Research. Front Pharmacol, 5:174, 2014.
 Ulmasov et al. (2016) Ulmasov, D., Baroukh, C., Chachuat, B., Deisenroth, M. P., and Misener, R. Bayesian optimization with dimension scheduling: Application to biological systems. In Kravanja, Z. and Bogataj, M. (eds.), 26th European Symposium on Computer Aided Process Engineering, volume 38 of Computer Aided Chemical Engineering, pp. 1051 – 1056. 2016.
 U.S. Food & Drug Administration (2018) U.S. Food & Drug Administration. Food and drugs, 21 C.F.R. § 314, 25 January 2018.
 Vanlier et al. (2014) Vanlier, J., Tiemann, C. A., Hilbers, P. A. J., and van Riel, N. A. W. Optimal experiment design for model selection in biochemical networks. BMC Syst Biol, 8(20), 2014.
 Woods et al. (2017) Woods, D. C., Overstall, A. M., Adamou, M., and Waite, T. W. Bayesian design of experiments for generalized linear models and dimensional analysis with industrial and scientific application. Qual Eng, 29(1):91–103, 2017.
Appendix A Table of Notation
Symbol  Description 

Function associated with model  
Dimension of function ;  
Design variable,  
Parameters of model ,  
No. of models ;  
No. of target dimensions;  
Measurement noise covariance  
The set of experimental observations  
The set of simulated data for model  
Appendix B Design Criteria
We let and the covariance , where is the covariance of model ’s marginal predictive distribution due to model parameter uncertainty.
For a singleresponse system, Box & Hill (1967) derive the design criterion , later generalised to a multiresponse form by Prasad & Someswara Rao (1977)
BuzziFerraris et al. (1990) derive the design criterion
designed such that if , the largest difference between model predictions is too small compared to the measurement noise variance to carry out model discrimination, and design of experiments terminates.
Michalik et al. (2010) proceed from the Akaike information criterion (AIC) as the model discrimination criterion to derive a design criterion from the Akaike weights
yielding , where
is the prior probability of model
.Appendix C Case study from Vanlier et al. (2014)
There are nine chemical components with concentrations , . The system of ordinary differential equations has the form
i.e. the stoichiometry is the same for all models . But some of the fluxes differ for the different models. For all models the following fluxes are identical:
For flux the models differ in the following way:
For flux the models differ in the following way:
We assume that the only measured states are the concentrations and , because these are the states from which Vanlier et al. (2014) collect their initial data. Similarly, we use the initial concentrations and as two of our design variables, the third design variable being the time point at which to measure the concentrations.
Vanlier et al. (2014) look at times points in the range , which we also adopt. We assume the initial concentrations and fix all other initial concentrations to
We assume the model parameter space . Simulations show that sampling from this parameter space gives a wide range of model realisations.
With reference to models and being similar, we see that the only difference between them is that the term divides and for and , respectively. If is small compared to , then the models are nearly identical.
Comments
There are no comments yet.