# Multifidelity multilevel Monte Carlo to accelerate approximate Bayesian parameter inference for partially observed stochastic processes

Models of stochastic processes are widely used in almost all fields of science. Theory validation, parameter estimation, and prediction all require model calibration and statistical inference using data. However, data are almost always incomplete observations of reality. This leads to a great challenge for statistical inference because the likelihood function will be intractable for almost all partially observed stochastic processes. This renders many statistical methods, especially within a Bayesian framework, impossible to implement. Therefore, computationally expensive likelihood-free approaches are applied that replace likelihood evaluations with realisations of the model and observation process. For accurate inference, however, likelihood-free techniques may require millions of expensive stochastic simulations. To address this challenge, we develop a new method based on recent advances in multilevel and multifidelity. Our approach combines the multilevel Monte Carlo telescoping summation, applied to a sequence of approximate Bayesian posterior targets, with a multifidelity rejection sampler to minimise the number of computationally expensive exact simulations required for accurate inference. We present the derivation of our new algorithm for likelihood-free Bayesian inference, discuss practical implementation details, and demonstrate substantial performance improvements. Using examples from systems biology, we demonstrate improvements of more than two orders of magnitude over standard rejection sampling techniques. Our approach is generally applicable to accelerate other sampling schemes, such as sequential Monte Carlo, to enable feasible Bayesian analysis for realistic practical applications in physics, chemistry, biology, epidemiology, ecology and economics.

## Authors

• 5 publications
• 4 publications
• 7 publications
• 4 publications
01/17/2020

### Multifidelity Approximate Bayesian Computation with Sequential Monte Carlo Parameter Sampling

Multifidelity approximate Bayesian computation (MF-ABC) is a likelihood-...
12/28/2019

### A practical guide to pseudo-marginal methods for computational inference in systems biology

For many stochastic models of interest in systems biology, such as those...
09/14/2019

### Rapid Bayesian inference for expensive stochastic models

Almost all fields of science rely upon statistical inference to estimate...
12/15/2021

### Measuring the accuracy of likelihood-free inference

Complex scientific models where the likelihood cannot be evaluated prese...
04/01/2019

### Robust Optimisation Monte Carlo

This paper is on Bayesian inference for parametric statistical models th...
11/15/2017

### Bootstrapped synthetic likelihood

The development of approximate Bayesian computation (ABC) and synthetic ...
10/13/2011

### BAMBI: blind accelerated multimodal Bayesian inference

In this paper we present an algorithm for rapid Bayesian analysis that c...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Stochastic processes are used to model complex systems in almost all fields of science and engineering. Partially observed stochastic processes result in some of the most computationally challenging problems for Bayesian inference [1, 2, 3]. Given the ubiquity of stochastic processes for real applications it is essential that efficient methods are developed to enable the analysis of modern high resolution data sets without sacrificing accuracy.

An important application of stochastic processes occurs in the study of cellular processes [4, 5]. Here, stochastic models of biochemical reaction networks often provide a more accurate description of system dynamics than deterministic models [6]. This is largely due to intrinsic noise in the system dynamics of many biochemical processes that are significantly influenced by relatively low populations of certain chemical species [7]. For example, in eukaryotic cells, molecules that regulate gene expression occur in relatively low numbers; as a result, stochastic fluctuations have a direct effect on the production rates of proteins [4, 8]. In addition, there are other interesting phenomena that occur in biological systems, such as self-induced stochastic resonance [9, 10, 11], stochastic focussing [12], and stochastic bi-stability [13, 14] that can only be captured by stochastic models.

To quantify uncertainty in parameters or predictions, it is typical to consider an expectation of a function of an unknown vector of model parameters,

, conditional on some observational data, ,

 E[f(θ)]=∫Θf(θ)p(θ∣D)dθ, (1)

where

is the Bayesian posterior probability density

[15],

 p(θ∣D)=L(θ;D)p(θ)p(D). (2)

Here, is the a prioriprobability density of unknown parameters, is the likelihood function that maps a parameter vector to the probability of the observations for the model with this parameter vector, and is a normalising constant often referred to as the evidence.

The evidence term, given by

 p(D)=∫ΘL(θ;D)p(θ)dθ, (3)

is rarely tractable and the expectation in Equation (1

) must be numerically estimated using sampling techniques, such as Markov chain Monte Carlo (MCMC)

[16, 17] and sequential Monte Carlo (SMC) [18], that only require point-wise evaluation of the likelihood function, . For partially observed Markov processes, point-wise evaluation of the likelihood requires the solution to the forward Kolmogorov equation, which which must be computed approximately. Therefore, standard Bayesian tools cannot be applied and likelihood-free methods are needed, such as approximate Bayesian computation (ABC) [19, 20, 21], pseudo-marginal methods [22, 23, 24], and Bayesian synthetic likelihood (BSL) [25, 26, 27]. Regardless of the likelihood-free approach, realisations of the stochastic model are generated by stochastic simulation in place of likelihood function evaluation. Thus, the computational cost of evaluating Equation (1) depends on the efficiency of the stochastic simulation algorithm and the posterior sampler used as a basis for likelihood-free inference. For example, ABC can be implemented using either rejection sampling [28, 29, 30], MCMC [31] or SMC approaches [32, 33, 34]. However, if the model is even moderately expensive then such techniques are still infeasible.

Recently, there has been substantial research activity in the application of approximate model simulations and posterior samplers in combination with bias correction adjustments to accelerate likelihood-free applications that would be impractical otherwise. For example, delayed-acceptance [35, 36, 37, 38] or early-rejection approaches [39]. Techniques such as transport maps [40]

and moment-matching transforms

[41] aim to transform a set of approximate posterior samples, using a surrogate or reduced model, into posterior samples under an expensive exact model. Other approaches such as preconditioning utilise approximate models to inform efficient proposal mechanisms [41]. Various early-rejection and delayed-acceptance schemes [36, 37, 38, 39] probabilistically simulate the accurate model based on the rejection/acceptance status of an approximation; this family of methods is generalised for ABC schemes under the multifidelity framework [42, 43, 44].

There has also been substantial research in the last decade on the use of control variates to improve the rate of convergence in mean-square of Monte Carlo estimates to expectations. In particular, the multilevel Monte Carlo (MLMC) method [45, 46]

expands an expectation as a telescoping summation of bias corrections and achieve variance reductions by exploiting path-wise convergence properties of numerical schemes for solving stochastic differential equations (SDEs)

[47] or discrete-state Markov processes [48, 49, 50]. Recently, the MLMC telescoping summation approach has also been successfully applied to Bayesian inference [51, 52, 53, 54, 55], as per Equation (1), including ABC-based samplers [56, 57, 58].

Many of the above approaches use approximate simulations in ways that are not always mutually exclusive. However, it is an open question as to how these various schemes could be combined to obtain compounding effects in performance.

### 1.1 Contribution

In this work, we derive and demonstrate an effective method for approximate Bayesian inference that combines the benefits of multifidelity ABC sampling with that of variance reduction with MLMC applied to the expectations with respect to an ABC posterior density. Acknowledging the wide range of variations in these algorithm classes, we specifically consider a natural application of these approaches to rejection samplers and extend the methods of Prescott and Baker [42], and Warne et al. [58]. However, our results are equally relevant to other sampling approaches, such as SMC, using more recent developments by Prescott and Baker [43] and Jasra et al. [57]. Using biochemical reaction networks as a characteristic application area, we present key computational features of ABC methods and introduce the fundamental algorithms developed by Prescott and Baker [42] and Warne et al. [58]. We then highlight the distinct features of the inference problem exploited by each approach to accelerate ABC rejection sampling, and derive our novel approach by leveraging the complementary nature of these features. We then explore various performance results using stochastic models of Michaelis–Menten enzyme kinetics and the repressilator gene regulatory network. We then establish requirements for practical application of our approach and provide rules-of-thumb to simplify the tuning of algorithm parameters. Finally, we demonstrate the efficacy of our multifidelity MLMC approach using a challenging model of a two–step MAPK cascade reaction, which forms the basis for many real biological functions such as cell-to-cell signalling through the epithelial growth factor receptor (EGFR) [59, 60]. Our results demonstrate the potential of multilevel and multifidelity methods to accelerate ABC sampling by several orders of magnitude.

## 2 Methods

In this section we describe standard numerical methods for simulation and inference of stochastic biochemical network models [21, 61, 62]. Then we present recent developments in MLMC [21, 58] and multifidelity [42] methods for inference, and develop novel extensions that achieve the benefits from both the MLMC and multifidelity approaches. Finally, we highlight practical challenges and provide guidelines for the application and tuning of our new method.

### 2.1 Stochastic models of biochemical reaction networks

We consider biochemical reaction network models that involve a well-mixed population of chemical species, , that react via a network of chemical reactions,

 N∑i=1ν−i,jXi→N∑i=1ν+i,jXi,j=1,2,…,M,

where and are, respectively, the reactant and product stoichiometries for species in reaction . Given the state of the system at time , with denoting the copy number of species at time , it can be shown that, for a sufficiently small time interval , the probability of reaction occurring within this interval is proportional to where is the propensity function of reaction . Generally, the propensity function will take the form

 aj(Xt)=kjN∏i=1ν−i,j!(Xi,tν−i,j),

where is the kinetic rate parameter for reaction , however, other more complex forms are also possible.

Mathematically, the stochastic dynamics of a biochemical network is governed by a discrete-state, continuous-time Markov process [62, 63, 64, 65]. The transitional density function of this process, , describes the probability of the system state at time given the state at a previous time . To obtain one needs to solve the forward Kolmogorov equation, also referred to the chemical master equation (CME) [62, 66],

 dp(xt∣xs)dt=M∑j=1aj(xt−νj)p(xt−νj∣xs)−aj(xt)p(xt∣xs), (4)

where is the state change corresponding to the occurrence of reaction , that is .

#### 2.1.1 Stochastic simulation methods

The CME is intractable for all but the simplest of models [21, 61, 62]. As a result, exact stochastic simulation [63, 67, 68] is required to study the system behaviour without introducing potentially substantial bias [41]. These exact stochastic simulation schemes, such as Gillespie’s direct method [63] (Algorithm 1), simulate every reaction event, and are computationally prohibitive for systems with very large copy numbers or very high reaction rates.

Various approximate stochastic simulation schemes, such as the tau-leaping method [69] (Algorithm 2), can be applied [62, 69, 70, 71, 72] to improve the computational performance, but there will be bias incurred due to the simplifying approximations [21, 73].

#### 2.1.2 Acceleration using multilevel Monte Carlo

Often the goal of stochastic simulation is to estimate the expectation, , where is a function of the process state at time . MLMC provides a mechanism to exploit approximations that computationally accelerate the estimation [21, 47, 49]. Assume we have a stochastic process , such as biochemical reaction network model, that is computationally expensive to simulate with the Gillespie direct method (Algorithm 1) or equivalent. Now consider a sequence of stochastic processes, , that approximate . This sequence is constructed such that the bias decreases and computational cost increases with . For example, could be a tau-leaping approximation with time step and constant for all . The insight of Giles [47] was to expand the desired expectation as a telescoping summation,

 E[f(XT)]=E[f(ZT,1)]+E[f(XT)−f(ZT,L)]+L∑ℓ=2E[f(ZT,ℓ)−f(ZT,ℓ−1)], (5)

where each of the difference terms act to correct for the bias of the initial approximation. While a direct Monte Carlo approximation to Equation (5) would necessarily be more computationally expensive than the original expectation, Giles [47] demonstrated that substantial computational improvements can be obtained if positive correlations can be induced between the terms in the bias corrections, that is between and for , and between and . For SDEs and discrete-state continuous-time Markov processes, coupling schemes that induce sufficiently strong positive correlations have been well studied [21, 46, 48, 50] and performance improvements of many orders of magnitude can be obtained without any loss in accuracy in terms of mean-square error.

### 2.2 Approximate Bayesian computation

For the purposes of inference, the likelihood function for a sequences of observations, denoted by , is given by

 p(Yobs∣θ) =∫Xn+1p(Yobs,xt0,…,xtn∣θ)n∏i=0dxti =∫Xn+1p(xt0)dxt0n∏i=1g(yobs(ti)∣xti,θ)p(xti∣xti−1,θ)dxti, (6)

where is the probability density of noisy observation , given the true state, , at time , and is the solution to the CME for the state transition over the time interval . Note that the CME in Equation (4) implicitly depends on the vector of rate parameters through the propensity functions, here in Equation (6) we make this explicit. The intractability of the CME immediately implies the intractability of the Bayesian inference problem. However, depending on the structure of the observation process, Equation (6) may still be intractable even when Equation (4) has an analytical solution. Therefore, almost any practical application of partially observed continuous-time Markov processes will require some form of likelihood-free inference.

ABC, pseudo-marginal methods and BSL are all possible approaches to avoid the dependence of Bayesian inference on the likelihood (Equation (6)) by generating simulated data from the likelihood using exact stochastic simulation along with the observation noise process. Here, we focus on the ABC approach that is based on the posterior approximation,

 p(θ∣Yobs)≈p(θ∣ρ(Yobs,Ys)≤ϵ)∝P(ρ(Yobs,Ys)≤ϵ∣θ)p(θ), (7)

where is a discrepancy metric between noisy observations, , and simulated noisy observations, , and is a sufficiently small discrepancy threshold. The most direct method to implement ABC is to use rejection sampling (Algorithm 3).

Here, independent identically distributed samples of rate parameters are drawn from the prior, and the sample is accepted if a resulting stochastic simulation of the model is within of the observations under the discrepancy metric. In practice, rejection sampling may not be computationally feasible due to the prohibitively high rejection rates when the data dimensionality is high. There are many techniques that can be applied to improve the computational efficiency of ABC methods that we do not discuss here due to the wealth of available literature [19, 20, 31, 32, 34, 74]. Instead we specifically investigate recent developments in MLMC and multifidelity methods [42, 58], as they provide a new avenue to explore computational improvements for ABC inference. While we focus on these methods as applied directly to ABC rejection sampling (Algorithm 3), we note that our work is also applicable to other schemes such as SMC [43, 57].

### 2.3 Multilevel Monte Carlo and multifidelity methods for ABC inference

In this section, we describe two recent methods for acceleration of ABC inference that are foundational to the main contribution of this work. These methods are presented by Warne et al. [58] and Prescott and Baker [42], respectively. Both methods build upon the MLMC work of Giles [47] and Rhee and Glynn [75] for efficient computation of expectations with respect to a stochastic process [47, 48]. We refer the reader to Warne et al. [21], Lester et al. [49], Schnoerr et al. [61], and Peherstorfer et al. [76], for accessible introductions to simulation and inference methods including MLMC and multifidelity methods.

#### 2.3.1 Multilevel rejection sampling

For ABC inference, Warne et al. [21, 58] consider direct application of the telescoping summation (Equation (5)) to a sequence of ABC rejection samplers (Algorithm 3). That is,

 E[f(θL)]=E[f(θ1)]+L∑ℓ=2E[f(θℓ)−f(θℓ−1)], (8)

with for threshold for , where is a large discrepancy threshold leading to a high acceptance rate (typically close to the prior) and . The greatest challenge in application of MLMC for inference is the construction of a coupling to generate positively correlated sample pairs without introducing additional bias that would violate the telescoping summation. For this Warne et al. [58] apply a novel construction using the marginal empirical distribution at level and the inverse marginal distributions obtained from the bias corrections up to level . That is, given i.i.d. samples , one generates

 ~θiℓ−1=[^F−1ℓ−1,1(¯Fℓ,1(θℓ,1)),…,^F−1ℓ−1,k(¯Fℓ,k(θℓ,k))], for i=1,…,Nℓ, (9)

where is the marginal empirical distribution of the th dimension of , denoted by , and is an estimate of the marginal distribution inverse for the th dimension of . The result is a sampling procedure for with a positive correlation induced between pairs. Warne et al. [58] note that this approach only strictly satisfies the telescoping summation in each marginal rather that the full distribution of . As a result, the sequence of discrepancies, , needs to be chosen so that the correlation structures between and are similar. This requires the bias correction terms in Equation (8) to be computed in order from due to the dependence on the previous level for the estimation of the marginal distribution inverses at level (Equation (9)). The complete process is given in Algorithm 4. Given an appropriate discrepancy sequence, the sample size sequence, , can be optimised to achieve improved convergence rates in mean squared error. However, the target accuracy needs to be sufficiently small for this improved convergence rate to take effect due to an overhead computational cost in generating the trial simulations needed to optimise the sequence .

#### 2.3.2 Multifidelity rejection sampling

An alternative approach developed by Prescott and Baker [42] utilises the telescoping summation in a probabilisic manner akin to the de-biasing approach of Rhee and Glynn [75]. Instead of considering a sequence of ABC samplers defined in terms of acceptance thresholds, they consider ABC rejection samples with different simulator fidelities. That is, a high fidelity simulator that is computationally expensive, such as the Gillespie direct method, and a computationally cheaper low fidelity simulator , such as tau-leaping approximation with time step . The idea is to perform ABC rejection sampling with the low fidelity simulator with discrepancy and acceptance threshold , and then perform a probabilistic bias correction that requires ABC rejection using a high fidelity simulation with discrepancy and acceptance threshold . The resulting estimator is given by

 E[f(θ)]≈∑Ni=1w(θi)f(θi)∑Ni=1w(θi), (10)

where are samples from the prior and the weight function is

 w(θ)=1(0,~ϵ](~ρ(Yobs,~Ys))+ξ[1(0,ϵ](ρ(Yobs,Ys))−1(0,~ϵ](~ρ(Yobs,~Ys))]. (11)

Here,

is a random variable given by

 ξ=1(0,η(~Ys)](U)η(~Ys), (12)

where and is the probability of generating a high-fidelity simulation given a realisation from the low-fidelity simulation. This so-called continuation probability can take many forms, however, the method originally proposed by Prescott and Baker [42] is

 η(~Ys)=η11(0,~ϵ](~ρ(Yobs,~Ys))+η21(~ϵ,∞)(~ρ(Yobs% ,~Ys)), (13)

where and are, respectively, the continuation probabilities when the low-fidelity simulation, , is accepted and rejected. The sampler proceeds according to Algorithm 5.

Prescott and Baker [42] prove that when and the multifidelity estimator (Equation (10)) is asymptotically unbiased, and under certain conditions may be optimised such that the multifidelity ABC rejection sampler (Algorithm 5) is more computationally efficient than direct rejection sampling with the high fidelity simulator (Algorithm 3). This improvement represents a decrease in the average computational cost for a given effective sample size, but it does not improve the convergence rate. However, the multifidelity approach will not incur an significant overhead for selecting the continuation probabilities as an adaptive scheme can be applied (See Appendix A).

### 2.4 Multifidelity MLMC for ABC inference

The MLMC and multifidelity approaches to ABC inference, MLMC-ABC and MF-ABC, obtain computational improvements in distinct ways that are also complementary. MLMC-ABC (Algorithm 4) combines a sequence of ABC rejection samplers with different discrepancy thresholds [21, 58] while the stochastic simulation scheme is fixed and must be unbiased. Conversely, MF-ABC (Algorithm 5) combines two stochastic simulation schemes [42] with the other elements of the ABC sampler remaining largely unchanged. Computationally, MLMC-ABC improves the convergence rate as at the expense of a tuning step that incurs an additional cost, and MF-ABC reduces the average simulation cost without improvements in the convergence rate. Our novel contribution in this work is to show how these methods can be combined to exploit the computational advantages of both.

We now derive our new method, called multifidelity MLMC for ABC (MF-MLMC-ABC). Assume we have two sequences of ABC acceptance thresholds, and , with and for all , and a sequence of time-step lengths, . Note that there are no constraints on the relation between the two acceptance thresholds sequences, nor any requirement that the sequence of time-step lengths be strictly decreasing or even monotonic. Given a set of model parameters, , let and denote, respectively, exact stochastic simulation (i.e., using the Gillespie direct method [63]) and approximate stochastic simulation with time-step (i.e., using the tau-leaping method [69]). Finally, let and be discrepancy metrics used to compare observed data with, respectively, exact simulation output, , or approximate simulation output, .

Given the above notation, for any we can apply MF-ABC sampling using weights,

 (14)

with continuation probability function

 ητℓ(Yτℓs)=ηℓ,11(0,~ϵℓ](ρτℓ(Yobs,Yτℓs))+ηℓ,21(~ϵℓ,∞)(ρτℓ(Yobs,Yτℓs)), (15)

for constants , for each and . Based on the results of Prescott and Baker [42], an expectation estimated this way would be unbiased with respect the ABC posterior under the exact stochastic simulation and discrepancy measure, that is, . This provides a connection to the MLMC-ABC telescoping summation in Equation (4). Therefore, we can apply the MF-ABC estimator (Equation (5)) using weights defined by Equation (14) to each of the terms in the MLMC-ABC telescoping summation (Equation (4)) and thereby arrive at the MF-MLMC-ABC estimator,

 E[f(θL)]≈^f=L∑ℓ=1Nℓ∑i=1wτℓ(θiℓ)gℓ(θiℓ)∑Nℓj=1wτℓ(θjℓ), (16)

where

 gℓ(θiℓ)={f(θiℓ)if ℓ=1f(θiℓ)−f(~θiℓ−1)if ℓ>1, (17)

and is constructed from and estimated marginal distribution functions obtained from the previous terms, as given in Equation (9), to implement an approximate coupling between levels [21, 58]. Due to the properties of MF-ABC, Equation (16

) is an unbiased estimator of Equation (

8[42] and therefore an unbiased estimator of up to the approximate coupling scheme [58].

We therefore arrive at the MF-MLMC-ABC method presented in Algorithm 6

. Note that the proposed approach, just as with MLMC-ABC and MF-ABC, requires a number of algorithmic hyperparameters to be selected appropriately to ensure efficient sampling. In the next section we discuss theoretical results that guide how these parameters should be selected, and we provide examples of how they can be practically tuned in the Results section.

### 2.5 Optimal algorithm configuration

There are several important algorithmic hyperparameters that must appropriately chosen to practically apply the MF-MLMC-ABC method. Each of these have various influences on the accuracy and performance of the MF-MLMC-ABC method. In this section, we will step through each of these algorithmic hyperparameters and provide theoretical results to optimally configure the method. In the next section, practical guidelines for all components for the MF-MLMC-ABC method are provided. The algorithmic hyperparameters that require optimisation are the number of levels, , the form of the sequences and , the sequence of samples to draw from each level, , and the sequence of continuation probabilities, . Of these, and

need to be selected heuristically, however, for any given

and , the sequences and may be optimised.

For a given level and assuming is selected, the optimal and can be determined through optimising the limiting efficiency as the number of samples . Prescott and Baker [42] show that this corresponds to minimising the function

 ϕ(ηℓ,1,ηℓ,2;gℓ)=E[wτℓ(θℓ)2(gℓ(θℓ)−E[gℓ(θℓ)])2]E[Cℓ(θℓ)], (18)

where is the computational cost of computing the weight (Equation (14

)). If the acceptance state of an approximate simulation is interpreted as a classifier for the predicted acceptance state for an exact simulation, and the true positive rate exceeds the false positive rate, then Equation (

18) can be minimised for (See Lemmas 4.2 and 4.3 in Prescott and Baker [42]). The optimal continuation probabilities are given by (See Corollary 4.4 in Prescott and Baker [42])

 (η∗ℓ,1,η∗ℓ,2)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩⎛⎜⎝ ⎷RℓpRℓ0,√RℓnRℓ0⎞⎟⎠,% if max{Rℓp,Rℓn}≤Rℓ0,(1,¯ηℓ,2),if max{Rℓp,Rℓn}>Rℓ0 and ϕ(1,¯ηℓ,2)≤ϕ(¯ηℓ,1,1),(¯ηℓ,1,1),otherwise, (19)

where

 Rℓp=pℓfpE[cτℓ(θℓ)]cℓp,Rℓn=pℓfnE[cτℓ(θℓ)]cℓn,Rℓ0=pℓtp−pℓfp,¯ηℓ,1=min⎧⎪ ⎪⎨⎪ ⎪⎩1,  ⎷Rℓp+pℓfpcℓn/cℓpRℓ0+pℓfn⎫⎪ ⎪⎬⎪ ⎪⎭,¯ηℓ,2=min⎧⎪ ⎪⎨⎪ ⎪⎩1,  ⎷Rℓn+pℓfncℓp/cℓnRℓ0+pℓfp⎫⎪ ⎪⎬⎪ ⎪⎭, (20)

and

 (21)

In Equations (20) and (21), and denote, respectively, the computational cost of generating an exact realisation, , and an approximate realisation, .

To optimise the sequence of samples , we aim to minimise the total expected computational cost of computing the MF-MLMC-ABC estimator, that is, , subject to the constraint where is the target variance. Using a Lagrange multiplier, it can be shown (See Giles [47], Lester et al. [49], and Warne et al. [58]) that the following scaling is optimal,

 Nℓ=O(h−2)√vℓcℓL∑m=1√vmcm,for ℓ=1,2,…,L, (22)

where and .

### 2.6 Practical algorithm tuning

A practical choice for the selection of most components of MF-MLMC-ABC is immediately available from the target ABC-based inference problem, for example, the exact stochastic simulation process

, the prior probability density,

, the discrepancy metric , and the target acceptance threshold . Other choices are easily motivated. The largest acceptance threshold, , can be chosen so that . The approximate stochastic simulation scheme, , could be chosen from a range of candidates, but a first order method such as tau-leaping will be an appropriate default choice in many cases. Given a tau-leaping scheme for the approximate model, it will often be appropriate to take , and therefore for all is applicable. Of course, there is freedom and flexibility in all of these choices, and the discussion section highlights some potentially useful alternative strategies.

Next the number of levels, , needs to be determined. Unfortunately, there is no general theory for this choice. However one common approach from the MLMC literature is to consider a geometric sequence then set . Some heuristics do exist to determine the appropriate scale factor . For MLMC with SDEs Giles [47] demonstrated is optimal, however, due to the approximate coupling scheme for MLMC-ABC, Warne et al., [21, 58] propose as a practical choice for inference.

Choice of the sequence of time-steps, , can be guided by Equations (19)–(21). For each level , must be small enough that , but also large enough so that the computation advantage is great enough to compensate to the mis-classifications. Ultimately, some experimentation is required compare the differences in the acceptance state between pairs of exact and approximate simulation. Fortunately, it is possible to identify poor choices of since the optimal continuation probabilities will be and the multifidelity step is less efficient then standard rejection sampling. It is also important to note that need not be a strictly decreasing or even monotonic sequence, and we demonstrate in the results section that works quite well for many applications. Furthermore, introducing a coupling scheme between the exact and approximate simulations [42, 48, 49] can reduce mis-classification rates of larger values of and lead to improved performance.

In practice, Equation (19) is solved for optimal continuation probabilities through the generation of initial trial samples [42]. In this work, we extend this through adaptive updates to while generating samples at level . For the trial samples, we generate samples using Algorithm 5 with , then initial estimates of the expectations in Equation (20) are produced through direct Monte Carlo estimates. Next, gradient descent is applied to iteratively update and toward the optimum in Equation (19) while also refining the estimates in Equations (20). While it is possible to iteratively refine the solution to Equation (19) directly as the sampling proceeds [42], we find that this is extremely sensitive to the initial estimates. Therefore we utilise our adaptive gradient descent MF-ABC sampler (See Appendix A) as a robust alternative to Algorithm 5.

Finally, to apply MF-MLMC-ABC the sequence of sample numbers, , are needed. Fortunately, we can rewrite Equation (22) as

 Nℓ∝1h−2√ϕ(ηℓ,1,ηℓ,2;gℓ)E[Cℓ(θℓ)]E[wτℓ(θℓ)]L∑m=1√ϕ(ηm,1,ηm,2;gm)E[wτm(θm)], (23)

to highlight that the optimal sequence of sample numbers, , is dependent on the optimal continuation probabilities obtained using the adaptive gradient descent MF-ABC scheme at each level. Therefore, we can estimate the terms required for optimal directly from the same trial samples used to estimate optimal .

### 2.7 Summary

The MF-ABC and MLMC-ABC methods exploit the multilevel telescoping summation, and properties of approximate stochastic simulation and ABC sampling in distinct and complementary ways. In the first instance, MF-ABC can improve on the expected cost of stochastic simulation in ABC rejection sampling using the randomized bias correction term [42, 75]. MLMC-ABC assumes exact stochastic simulation for ABC rejection sampling, but applies MLMC techniques to a sequence of correlated samplers with discrepancy thresholds with the effect of improving the convergence rate [57, 58]. We develop a new method, MF-MLMC-ABC, that results from application of MF-ABC sampling for each of the terms in the MLMC-ABC telescoping summation. In the next section we demonstrate how to tune these methods practically and show the computational benefits of both can be exploited to achieve improvements of two orders of magnitude.

## 3 Results

Using a variety of biologically relevant stochastic biochemical reaction network models, we demonstrate the substantial computational improvements using our new MF-MLMC-ABC method (Algorithm 6). First we consider the properties of the MLMC-ABC (Algorithm 4) and MF-ABC (Algorithm 5) methods for a fundamental biochemical building block, the Michaelis–Menten model, to show practically how to tune these methods. Then we apply these guidelines to tune our new MF-MLMC-ABC method for the repressilator gene regulatory network to show the performance benefits over MF-ABC and MLMC-ABC. Finally, we perform a realistic test on the two-step MAPK cascade network that is of fundamental importance in cell biology. For two computationally challenging networks we show that MF-MLMC-ABC effectively combines the advantages of both MF-ABC and MLMC-ABC to accelerate ABC rejection sampling by two orders of magnitude.

### 3.1 Initial explorations of MF-ABC and MLMC-ABC: Michaelis–Menten kinetics

Using a stochastic network of Michaelis–Menten enzyme kinetics [77, 78], we demonstrate the essential requirements and computational benefits of the MF-ABC and MLMC-ABC methods in order to inform the configuration of MF-MLMC-ABC for more challenging networks. The Michaelis–Menten enzyme kinetics model describes the catalytic conversion of a substrate, , into a product, , via an enzymatic reaction involving enzyme, ,

 E+Sk1→[ES],[ES]k2→E+S,[ES]k3→E+P, (24)

with kinetic rate parameters, , , and . Biologically, this network is of interest since many intracellular processes are built from Michaelis–Menten sub-components. Computationally, the Michaelis–Menten model is a minimal example of a network without a closed-form solution to the CME, however, with only three rate parameters and four chemical species, ABC inference is feasible even with rejection sampling [21].