# Synthetic likelihood method for reaction network inference

We propose a novel Markov chain Monte-Carlo (MCMC) method for reverse engineering the topological structure of stochastic reaction networks, a notoriously challenging problem that is relevant in many modern areas of research, like discovering gene regulatory networks or analyzing epidemic spread. The method relies on projecting the original time series trajectories onto information rich summary statistics and constructing the appropriate synthetic likelihood function to estimate reaction rates. The resulting estimates are consistent in the large volume limit and are obtained without employing complicated tuning strategies and expensive resampling as typically used by likelihood-free MCMC and approximate Bayesian methods. To illustrate run time improvements that can be achieved with our approach, we present a simulation study on inferring rates in a stochastic dynamical system arising from a density dependent Markov jump process. We then apply the method to two real data examples: the RNA-seq data from zebrafish experiment and the incidence data from 1665 plague outbreak at Eyam, England.

## Authors

• 2 publications
• 2 publications
• ### Bayesian inference of Stochastic reaction networks using Multifidelity Sequential Tempered Markov Chain Monte Carlo

Stochastic reaction network models are often used to explain and predict...
01/06/2020 ∙ by Thomas A. Catanach, et al. ∙ 0

• ### Bayesian analysis of dynamic binary data: A simulation study and application to economic index SP

It is proposed in the literature that in some complicated problems maxim...
10/06/2019 ∙ by Ali Reza Fotouhi, et al. ∙ 0

• ### Adaptive Physics-Informed Neural Networks for Markov-Chain Monte Carlo

In this paper, we propose the Adaptive Physics-Informed Neural Networks ...
08/03/2020 ∙ by Mohammad Amin Nabian, et al. ∙ 5

• ### Spectral Subsampling MCMC for Stationary Time Series

Bayesian inference using Markov Chain Monte Carlo (MCMC) on large datase...
10/30/2019 ∙ by Robert Salomone, et al. ∙ 0

• ### Exact Subsampling MCMC

Speeding up Markov Chain Monte Carlo (MCMC) for data sets with many obse...
03/27/2016 ∙ by Matias Quiroz, et al. ∙ 0

• ### Bayesian Classification of Multiclass Functional Data

We propose a Bayesian approach to estimating parameters in multiclass fu...
08/02/2018 ∙ by Xiuqi Li, et al. ∙ 0

• ### Exploiting network topology for large-scale inference of nonlinear reaction models

The development of chemical reaction models aids system design and optim...
05/12/2017 ∙ by Nikhil Galagali, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Recent developments in molecular technologies have allowed us to perform complex biological experiments aiming at learning the principles of signaling networks in living organisms. For instance, the knowledge of a biochemical network of physiological processes of living cells offers insights into developing targeted therapies for a wide range of diseases, such as cancer, diabetes and more, by suggesting possible targets in gene pathways. In addition, decoding these mechanisms in organisms with regeneration capabilities, like the zebrafish, can potentially help biologists at least partially answer questions about why humans lack these abilities. However, reverse engineering biochemical networks from observed data has proven to be challenging from both statistical and computational standpoints (oates2012network, ; daniels2015automated, ). Indeed, the high-throughput molecular technology pushes the boundary of a classical statistical inferential paradigm, as the molecular quantification methods such as next generation sequencing, cellular flow cytometry, and fluorescence microscopy (see, e.g. Perez:2004fk ; Wheeler:2008ys ), provide large amounts of high dimensional, longitudinal data from partially observed and poorly understood biochemical systems.

The goal of most statistical inference problems in this setting is either one of structure (topology) or parameter estimation (oates2014causal, )

. Although related, these two inference areas are often considered separately. The first addresses the structure of the underlying biochemical network, i.e. which chemical species directly alters the production rate of another. In this paradigm the object of interest is usually a directed graph or a binary adjacency matrix describing the gene regulation structure of the system. Bayesian networks

(pearl1985bayesian, ; pearl1986fusion, ), and, more recently, dynamic Bayesian networks, have been used for learning topology of such models since their graphical structure, which is encoded through distributional assumptions, is in essence the desired output (morrissey2010reverse, ; morrissey2011inferring, )

. Bayesian networks and other similar graphical models are appealing because they provide biologists with a simple visual representation of the network. However, the fundamental disadvantage of simple graphical approaches is their reduction of a detailed kinetic view of the system to a set of often unrealistic assumptions on the relationships between graph nodes (representing either chemical or molecular species, like genes). For instance, although it is known that the interactions of species in biological systems typically exhibit a high degree of nonlinearity, it is often assumed that nodes are hierarchically (acyclically) linearly related, with the conditional Gaussian distributions. A practical disadvantage of these methods is that their solutions are of non-polynomial time complexity, and hence algorithms used for model fitting lead to suboptimal solutions, for instance by requiring greedy algorithms. Markov chain Monte Carlo (MCMC) routines have been shown to give moderate improvement in terms of finding optimal network structure, but computational issues still persist.

In addition to network structure or topology estimation, the second related inference area is that of network parameters (e.g., reaction rates) estimation. Typically, the methods here focus on fitting detailed kinetic models to the data. Their appeal lies in the fact that they rely on more precise representations of the underlying dynamical system (fearnhead2014inference, ; golightly2005bayesian, ; golightly2012, ; oates2012network, ). Their disadvantages are numerous however, particularly of the methods based on exact likelihoods, where the inference is usually not feasible for systems of relevant sizes, i.e. hundreds to thousands of species and reactions. For that reason the corresponding Bayesian frameworks based on the exact likelihoods are usually not applicable to topology estimation (boys2008bayesian, ; choi2012inference, ). Methods based on approximate likelihood, like the linear noise approximation (or LNA, see (komorowski2009bayesian, )), fare somewhat better, but they also usually do not allow for efficient posterior sampling due to the complicated form of the likelihood approximation (see, for instance, (Linder:2015aa, )).

In this paper, we develop a fully Bayesian inference framework for stochastic reaction networks on both structure and parameters given time course trajectories from the stochastic dynamical systems under mass action kinetics. The method uses summary statistics to form a synthetic likelihood following the ideas presented by

Wood:2010aa . The advantage of the proposed methodology is that the synthetic likelihood is based on detailed kinetic models, and its form permits topology and parameter estimation simultaneously, instead of separately. Our approach is Bayesian and allows efficient computation of the posterior network structure through point mass priors on parameters. An outline of the paper beyond the current section is as follows. In Section 2 we describe the types of dynamical systems under consideration. In Section 3 we describe the synthetic likelihood, the relevant priors, and outline an efficient MCMC algorithm to sample posteriors. The model fitting is performed using in silico reaction network data in Section 4 and RNA-seq data from controlled experiments in the zebrafish in Section 5. In Section 5 we also give an example of application of our framework beyond biochemical systems by analyzing historic data from the 1665-1666 plague outbreak in Eyam, England. The technical details of the MCMC algorithm derivations and R code implementation are available in the online Supplementary Material.

## 2 Stochastic reaction network system

The reaction systems we consider consist of chemical species, along with a set of reaction channels where commonly . We denote the system state at time

as the vector

of dimension , containing molecular counts of each species. The constant gives the reaction rate or speed of the reaction, . When the reaction occurs at time , the system transitions according to the integer valued vector , where is a vector of non-negative integers representing the number of species produced by reaction and representing the number consumed, i.e.

 X(t)=X(t−)+ν′k−νk.

where is the system state at the instantaneous time before . We denote by the system volume, typically the physical volume of the container (e.g., cell) times Avagadro’s number, and by the unit Poisson processes. Our further analysis is on systems that are well-stirred and thermally equilibrated, with processes obeying the classical mass action rate laws, (see, e.g., gillespie1992rigorous, ) corresponding combinatorially to the number of different ways we can choose molecular substrates needed for the reaction to occur (ethier2009markov, , chapter 10). Defining , the rates are

 λ(n)k(x)=nκk∏iνik!(xiνik)n|νk|.

The nonhomogenous Poisson process with the above propensity function (see also (gillespie1992rigorous, )) gives the system time-evolution equation

 X(t)=X(0)+∑kYk(∫t0λ(n)k(X(s))ds)(ν′k−νk). (1)

The model in Equation (1) is often considered the most accurate representation of true system dynamics, and is in the general class of density dependent Markov jump processes (DDMJP). While the class of DDMJP models are often used to describe a wide variety of physical systems, like gene regulatory networks and stochastic epidemics, unfortunately the corresponding inference is complicated by highly intractable exact likelihoods.

Letting

we obtain species concentrations (say, in moles per unit volume or relative density). The asymptotic notion of a large volume limit represents the system’s behavior as its volume increases to infinity while the species numbers are kept at constant concentrations. This gives the classical deterministic law of mass action ordinary differential equation (ODE), which is referred in the chemical literature

(van1992stochastic, ) as the reaction rate equation

 ˙c(t)=∑kκk∏icνiki(t)(ν′k−νk). (2)

The solution of the above ODE is parameterized by a vector , a linear combination of the kinetic rates of interest, as well as the initial condition . In what follows we will focus on estimation of

under the assumption that it is a linear transformation of the

’s. It is well known that identifiability of reaction networks is a nontrivial problem and is only guaranteed when certain reaction vectors are linearly independent for each source complex (craciun2008identifiability, ). However, the reparameterization from to can be done so as to ensure that is identifiable, as is the case for the examples we consider below. Results in Remp12 show that the least squares estimator, , which minimizes the distance between the data and the solution to (2) is consistent and asymptotically normal. These asymptotic properties are also true for solutions to the so-called martingale estimating functions, which are a generalization of the least squares estimator in this case (bibby1995martingale, ). Both methods produce statistics that are easily obtained from time course trajectories of the system. In what follows, we use the asymptotic properties of these statistics and their estimating equations to form a synthetic likelihood. The synthetic likelihood serves as a surrogate for, often intractable, exact likelihood and may be used in the same way to perform the usual Bayesian inference.

## 3 Synthetic likelihood

Ideally, parametric inference would be based on the likelihood function since, under the typical regularity conditions, maximum likelihood estimates enjoy good asymptotic properties, such as consistency and efficiency. The likelihood approach also gives one the ability to perform Bayesian inference. Unfortunately, the usage of exact likelihood methods for parameter estimation in stochastic biochemical networks faces some challenges due to the need for computationally demanding routines, like, for instance, the particle filters (Golightly:2011aa, ). For that reason many authors have focused on approximate likelihood methods for reaction networks (fearnhead2014inference, ; golightly2005bayesian, ; golightly2012, ; oates2012network, ). However, major practical limitation of many such methods, for instance, approximate Bayesian computation (ABC) is their slow convergence and poor mixing in high dimensional problems (csillery2010approximate, ). To circumvent these technical complication we propose here an alternative method that is based on the idea of synthetic likelihood (Wood:2010aa, ).

To introduce some notation, consider the data or the system trajectory consisting of , which are the observed counts of species, measured at a discrete grid of timepoints, , ( not necessarily equidistant across trajectories, and with possibly different endpoints; i.e., not necessarly equal to . We assume that this observed data arrives from trajectories of the process for which the system volume is fixed and known and define the concentration values as . The LSE for the observation is then any solution of the optimization problem

 ^βj=argminβ∑tij||Cn(tij)−cβ(tij)||22 (3)

or equivalently any solution to the following estimating equation

 ∑tij∂cβ(tij)(Cn(tij)−cβ(tij))=0. (4)

Asymptotic properties, (as ), and the regularity conditions for the consistency and normality of these solutions were discussed in Remp12 , with all systems under consideration here satisfying these conditions. The expression above is similar in form to the generalized estimating equations (GEEs) (Zeger:1986aa, ). GEEs have been used extensively for correlated and longitudinal data where a parametric form of the mean is known but the likelihood function is not readily available. The appeal of GEE estimates is that they exhibit many of the same properties of maximum likelihood estimates (Zeger:1986aa, ), even when the correlation structure of the dependent observations is misspecified.
The ideas from GEEs were extended to discretely observed diffusion processes in bibby1995martingale by considering so-called martingale estimating functions (MEF). Defining the filtration , scaling the process by , consider all zero-mean -martingale estimating functions of the form

 Gj(β)=∑tijg(i−1)j(β)(Cn(tij)−Fi(Cn(ti−1j),β)) (5)

where is measurable and . It was also shown in bibby1995martingale that the optimal estimating function, in the sense of the smallest asymptotic dispersion and where the data is assumed to arise from discrete observations of a diffusion, is of the form with . The optimal estimating function may be approximated by, , with the substitutions as well as . Fitting may then be done iteratively by first updating the weights in (5) for the current value of with and then updating by solving (5) and repeating this process until convergence, or by replacing with its empirical estimate.

### 3.1 Form of the Synthetic Likelihood

Given the partially observed trajectory, , we denote by the solution of either (4) or (5). In the mass action setting, the ODE coefficients are linear combinations of the reaction rates and can be written as where is a matrix (see, e.g., (Linder:2013aa, )). Some specific examples of and are given in Section 4 below. It is straightforward to show that is asymptotically Gaussian; i.e., as previously mentioned, see Supplementary Material. The normality of allows to express the synthetic likelihood for the trajectory (replicate) as

 SLj(κ,Σ|^βj):=f(^βj|Qκ,Σ)=(2π)−d/2|Σ/n|−1/2exp{−12(Qκ−^βj)⊤(Σ/n)−1(Qκ−^βj)} (6)

where is the corresponding limiting covariance matrix. We have maintained the standard likelihood notational convention by writing it as a function of parameters, given data. This is in sharp contrast to the majority of ABC type methods, where such data summaries are typically chosen in an ad-hoc fashion. Unfortunately, the Pitman-Koopman-Darmois (PKD) theorem essentially guarantees the failure of ABC type methods, since it states that the existence of finite dimensional sufficient statistics is a unique property of the exponential family. The implications of PKD are thus quite disappointing in the context of ABC methods, where typically the interesting (non-analytic) likelihoods are outside of the exponential family. Consequently, ABC methods based on finite dimensional summary statistics are typically guaranteed to suffer information loss vis a vis exact likelihood inference.

Consider the trajectory and the vector of stacked species concentrations,

; i.e., by stacking the concentration vectors at each timepoint. The central limit theorem for DDMJP then gives

, so that the trajectory data likelihood, converges as to the Gaussian likelihood

. The law of large numbers and consistency of

imply that , so that

 L(κ|vec(Xj)/n)≈(2π)−dmj/2|Σκ/n|−1/2exp{−1/2(c^βj−cκ)⊤(Σκ/n)−1(c^βj−cκ)}.

The above approximation is seen to hint at a notion of asymptotic sufficiency (AS) in the sense of le1956asymptotic . AS is essentially an asymptotic Neyman-Fisher factorization, and it implies that, at least asymptotically, the chosen statistics contain meaningful information, thus offering some notion of efficiency, even in light of the PKD theorem. However, establishing AS property more formally requires careful analysis of specific inferential problems on case by case basis. For further discussion, see, for instance, frazier2016 .

It is important to note that in the above arguments, is the process covariance matrix for the stacked concentration vector in the approximate data likelihood and not the covariance of the summary statistics () in the synthetic likelihood. In fact, since also depends on , its usage in the likelihood function above would break down the conjugacy and the efficient MCMC via Gibbs sampling would be no longer available. Thus, when we have a single trajectory, as in the Eyam plague example below, we use the asymptotic covariance matrix of the summary statistic, for , to form the synthetic likelihood, although we have suppressed the explicit subscript notation for simplicity. When multiple trajectories are available, we additionally have the ability to assess the between trajectory variation of the summary statistics. See below for details.

The transition from the original likelihood to the synthetic likelihood shifts our analysis into the setting of a classical linear model. Thus, this approach is vastly different from most of the currently used ones. Specifically, methods based on the reaction rate equation use ODE solutions as the means of corresponding Gaussian likelihoods (girolami2008bayesian, ), for instance, those based on the diffusion approximation and the LNA (komorowski2009bayesian, ; fearnhead2014inference, ). These approaches, while being approximations, still face serious computational challenges. The main bottleneck for inference, particularly Bayesian one, in these models is non-conjugacy, since the ODE means are not linear in the parameters. As such, each iteration of MCMC requires solving complex systems of nonautonomous ODEs at each proposal value, like in girolami2008bayesian . This can make tuning proposal distributions with good acceptance properties difficult, leading to chains with poor mixing. The ABC methods do not fare much better, since they require summary statistic, distance measure, and tolerance selection that are often ad-hoc. These problems severely limit the applicability and scalability of the current approximate procedures. In contrast, our synthetic likelihood approach circumvents the need to choose distance measures and tolerance levels by using data summaries that are well understood, and allow for their analysis via standard MCMC. It also only involves solving ODE systems once (to compute initial summary statistics and covariances) thus avoiding the iterative usage of the ODE solver. Finally, the synthetic likelihood form leads itself to the efficient formulation of the MCMC computation steps via a Metropolis-within-Gibb’s procedure.

### 3.2 Prior Specification

The Bayesian approach to network estimation can be addressed by using specialized priors that allow coefficients to be in the model or out of the model during iterations of MCMC, leading to positive posterior probabilities of zero value.Various mixtures of mutually singular distributions, each dominated by

-finite measures are a natural choice. Here, we assume a discrete mixture of the point mass at zero, , and a continuous distribution, , supported on the positive reals and dominated by the Lebesgue measure . Restricting the support of to positive reals is necessary to convey the fact that kinetic parameters are non-negative. It was shown in gottardo2008markov

that when the prior probability of non-zero rate for reaction

is , the corresponding density for is of the form

 π(κk):=dΠd(δ0+μ)=(1−ωk)I0(κk)+ωkf(κk)IR+(κk) a.e δ0+μ (7)

where . Priors of the form (7) lead to minimax rates of estimation and posterior contraction on sparse sets, provided the tails of are exponential or heavier (see, e.g., johnstone2004needles ; castillo2012needles ). Thus, priors of the form (7) are optimal under certain criteria and are usually considered the theoretical gold standard for variable selection in the Bayesian setting. To that end, we assume that , which is the exponential, or one-sided Laplace, distribution. Our choice of the exponential is motivated by several important factors: it naturally restricts the support of to positive reals, it satisfies the tail requirements mentioned previously, and from the information theoretic perspective it is the maximum entropy prior with mean (see (robert2007bayesian, ), Chapter 3). It may also be rewritten hierarchically as , with , via the identity (west1987scale, ; andrews1974scale, ). By writing the prior as a scaled mixture of truncated normals, the form of ’s full conditionals becomes analytically tractable, as is demonstrated in the Supplementary Material. This gives significant computational advantage to our framework, since priors like (7) often lead to non-conjugacy, in which case full Metropolis-Hastings (MH) is required for sampling. Although adaptive MH-based MCMC algorithms have been designed to produce chains with desirable acceptance rates (ji2013adaptive, ), when the likelihood is complicated or not analytic, as in the present situation, such tuning is not straightforward. Thus when appropriate tuning cannot be done, the resulting chains may exhibit poor mixing and require extremely long run times for sufficient exploration of parameter space. In what follows, we detail the Gibbs sampling procedure, which performs tuning automatically, for posterior sampling under our hierarchical model. Our primary interest is in obtaining the posterior probability that reactions are true, which allows one to infer the reaction network structure. In order to accommodate varying experimental conditions, such as differences in measurement error or experiments with data collected at different timepoints, we place a Wishart prior on the covariance matrix, . This is in contrast to the common assumption of the inverse-Wishart, see daniels1999nonconjugate ; bouriga2013estimation ; alvarez2014bayesian for some examples. Our approach is similar to chung2015weakly , in that we assume a Wishart (not inverse-Wishart) prior for

that leads to desirable modal properties of the posterior. We select the hyperparameter

to be the empirical covariance of , and when this estimate is not full rank we add a small regularization term, , to its diagonals. The hierarchical model under consideration is then

 SL(κ,Σ|D)= N∏jSLj(κ,Σ|^βj) π1(κ|λ)= ∏k((1−ωk)I0(κk)+ωkλkexp{−λkκk}IR+(κk)) π2(Σ|Ψ)= |Ψ|−v/2|Σ|(v−d−1)/2exp{−12trΨ−1Σ}2vd/2Γd(v2) (8)

where indexes the independent trajectories of the process. The model contains a covariance term, , and this parameter may represent intrinsic stochastic noise, as well as measurement error which will dominate in the large volume limit. This parameter is not of particular interest for network or kinetic rate estimation, and a clear advantage of the Bayesian framework is the ability to marginalize this nuisance parameter out of the posterior. Importantly, we show that the marginal synthetic posterior distribution, is unimodal when and (see Supplementary Material). This is a key property of the proposed method that not only guarantees identifiability of the reaction network but also contributes to the observed rapid mixing of the MCMC procedure.

### 3.3 Posterior Computation

Here we describe the algorithm to efficiently sample from the posterior distribution with a Metropolis-within-Gibbs sampler. To simplify notation, define and . The term is the prior probability that reaction channel is true (non-zero). The posterior computation can then be performed with the following steps.
Algorithm 1.
Step 1. For , compute where

 Mk=2√1τ2k(ukk+1/τ2k)exp{(sk−∑i≠kuikκi)22(ukk+1/τ2k)}(1−Φ(0,(sk−∑i≠kuikκi)(ukk+1/τ2k),(ukk+1/τ2k)−1)). (9)

With probability , sample from the truncated Gaussian and then from inverse Gaussian,

 κk∼ N((sk−∑i≠kuikκi)/(ukk+1/τ2k),(ukk+1/τ2k)−1)IR+(κk) 1τ2k∼ IG(λkκk,λ2k)

Else, set with probability .
Step 2. Given the current sample , propose from Wishart,
Step 3. Accept with probability

In the above notation , is the element of , is a Gaussian random variate with mean

and variance

, is the Wishart density evaluated at with scale matrix , is an inverse Gaussian random variate with mean and scale , and

is the Gaussian cumulative distribution function with mean

and variance evaluated at . We have found that a proposal degrees of freedom, , gives relatively good acceptance rates, between % in our empirical studies. Derivations of the full conditionals may be found in the Supplementary Material. Hence, sampling from the full conditional of is done by sampling from each ’s second mixture with probability and from the degenerate component with corresponding probability . Expressions for the individual parameters’ and weights’ full conditionals, and , allow for sampling from the target distribution by local parameter-wise updates. Although global moves can lead to optimal acceptance rates, tuning proposals that must be absolutely continuous with respect to measures like is not straightforward, and even less so for likelihood-free methods. Additionally, the scheme allows inference about posterior reaction probabilities to be improved via Rao-Blackwellization (gottardo2008markov, ). In the remaining sections we illustrate the usage and performance of Algorithm 1 with both simulated and real data examples.

## 4 Simulation Study

To illustrate network topology estimation using the proposed synthetic likelihood approach, we consider a molecular reaction network partially motivated by the heat shock response. Heat shock transcription factors and protein chaperones are critical to ensure proteins fold into specific three-dimensional structures. Newly formed proteins and proteins within cells that have been challenged with damage risk protein misfoldings that may effect their functional activity hartl2011molecular . Accumulation of such toxic species (misfolded proteins) has been implicated in the progression of certain neurodegenerative diseases neef2011heat ; hartl2011molecular , and has lead to research into development of theraputic targets that restore proteostasis calamini2012small . Hence, modeling the cells ability to employ this chaperone machinery to acheive proteostasis may reveal theraputic targets. As a toy in silico, we consider the following reaction network that has transcriptional and chaperone components, along with redundant reactions, to compare the proposed methodology with existing ones via simulation.

 ∅→κ1 P1∅→κ2P2P1→κ3R1P2→κ4R1 P1→κ5P1+P2 (10) R1+P2→κ9∅

Here and are representing proteins and represents gene RNA expression, with the transitions in/out of indicating loss/creation of a molecule. For the system of reactions (4) the mass action ODE (2) parameterized by specializes to

 dP1dt= β1−β2P1+β3P2 dP2dt= β4+β5R1+β6P1−β7P2−β8R1P2 (11) dR1dt= −β9R1+β10P1+β11P2−β8R1P2.

We note that in this particular case , where

 Q=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1000000000000010000000100000010000000100000000000000001000000000100000000001000000010000000010000000001−10100001000000000000100000000⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

In our present setting , and . For our simulation study we generated trajectories from the pure jump process of the system of reactions (4) via Gillespie’s algorithm (see, e.g., (van1992stochastic, )) with parameters and initial molecular copy numbers of 50 for each of the three species. Note that under this set of kinetic parameters enters the system from an external source and acts as a transcription factor for and a chaperone for , which we model by reactions 1, 3, and 6 (these are labeled by their respective subscripts in (4)). acts as a suppressor of the transcription of through reaction 9 and all species have a natural degradation rate through reactions 10, 11, and 12 respectively. All others reactions are superfluous.

We calculated the required LSE and MEF -based statistics by fitting the mass action ODE in (4) to simulated stochastic trajectories from (4). We set the degrees of freedom hyperparameter to , for equal a priori probability that a reaction channel is true or false, and , which in combination with guarantees a unimodal posterior. For comparison, we perform analysis using the adaptive MCMC routine of vihola2012robust with the LNA likelihood approximation and uniform priors on the logarithm of parameter values finkenstadt2013quantifying . Additionally, we implemented the particle filtering routine of Golightly:2011aa , which computes unbiased likelihood estimates within MCMC using 100 particles generated via Gillespie’s algorithm and assigned uniform priors on the logarithm of parameter values. Tables 1 and 2 contain the posterior median estimates from chains of 50,000 MCMC samples from the MEF-based and LSE-based synthetic likelihood method. For the class of point mass mixture priors, the posterior median has been proven to be a legitimate thresholding rule, see johnstone2004needles , so it may be used for both variable selection and estimation simultaneously in our setup. Tables 3 and 4 give posterior means from LNA analysis with 50,000 samples and 10,000 samples from the particle marginal Metropolis-Hastings algorithm respectively. All algorithms are coded in R and run on a personal desktop computer with 2.7 GHz clock speed.

The results in Tables 1 and 2 indicate that both synthetic likelihood methods have performed reasonably well in the example, with the MEF-based synthetic likelihood analysis performing the best. The improvement of the MEF-based synthetic likelihood over the LSE-based one is likely due to a generally better efficiency of MEF over that of LSE. There is also apparently some degree of bias in the LSE estimate of , even with increasing . MEF-based likelihood analysis assigned to the true reactions non-zero posterior medians for all sample sizes, while assigning zero posterior medians to nearly all the false reactions for two or more trajectories, with the exception of . MEF-based analysis performed better than LSE-based analysis for all reactions and for each number of trajectories. Both LSE and MEF-based analysis gave high posterior probability to the protein being a transcription factor for (reaction 3) as well as acting as a suppressor of transcription of (reaction 9), which was indeed the case in the simulation. Since in actual experiments such reactions often indicate drug targets, the fact that our method was able to correctly identify them is of practical relevance. The results for both the LNA and particle filtering, located in Tables 3 and 4, show that their corresponding run times are already unacceptable in this moderate sized system. Indeed, we found that obtaining just 10,000 samples from the posterior distribution with the particle marginal Metropolis-Hastings implementation required more than two weeks of CPU time. Collecting 50,000 samples from the posterior distribution using the LNA, with adaptive MCMC for optimal acceptance rates of 0.234, took approximately 12 days of CPU time. The bottleneck of computation for the particle filtering is that unbiased likelihood estimates require sampling many trajectories, in our case 100, for each likelihood evaluation. For the LNA based analysis, each likelihood evaluation requires solving a system of non-autonomous ODEs. The particular examples in Tables 3 and 4 illustrate the generally accepted view that, at least until now, most of the current methods that rely on detailed system modeling do not scale well. Not only did the methods perform poorly in terms of long run times, they also produced estimates that appear biased away from the true values of certain parameters, even as the number of trajectories increases. For the particle filtering routine, estimates for and have a high degree of bias, whereas the LNA-based analysis appears to incorrectly infer as true (non zero) and as false (zero), even with all 5 trajectories data provided. The synthetic likelihood methods perform better, in terms of inference and computation, by projecting a high dimensional noisy trajectory into a lower dimensional statistic that captures the important dynamical information with less noise. It appears that for the full Metropolis type methods, the variable selection priors, like the point mass ones used in our synthetic likelihood methods, would likely pose even greater computational challenges than the continuous priors applied in our examples here since the routines for tuning the necessary proposals are not straightforward for the LNA, the particle filter, or any Metropolis type sampling with intractable likelihoods.

The plots in Figure 1 indicate that the chains resulting from the LNA and particle filtering have a much higher degree of auto-correlation as compared to the synthetic likelihood methods. Thus, in addition to the increased computational time of each MCMC sample, more samples are required in order for the LNA and particle filter to sufficiently explore parameter space in our current setting. We conclude by these plots that although adaptive MCMC was used, at least for the LNA likelihood approximation, the resulting chains exhibit poor mixing. Although theoretically Metropolis-Hastings type samplers can be tuned to produce optimal acceptance properties, the results here highlight the general difficulty of tuning in the presence of complicated or intractable likelihoods.

## 5 Data Examples

### 5.1 RNA-Seq Data

We now compare the performance of our synthetic method to that of the algebraic statistical model (ASM). The method was introduced in craciun2009algebraic to learn biochemical network topology from the empirical patterns of the reaction stoichiometries (). To facilitate the comparison, we re-analyze a dataset introduced in Linder:2013aa and consisting of the longitudinal RNA-seq measurements from the retinal tissue in the zebrafish (Danio rerio). The study was performed to probe the regenerative properties of the zebrafish retina after it sustained cell-specific damage. One interest of the study was in analyzing a particular sub-system, consisting of the following species: heat shock protein transcription factor (Hsp70), signal transducer and activator of transcription 3 (Stat3), and the suppressor of cytokine signaling 3 (Socs3). For more details on the experiment, see Linder:2013aa . The network of interest has the form

 ∅ κ1−→Stat3Stat3κ4−→2Stat3 ∅ κ2−→Socs3bStat3→κ5Socs3b ∅ κ3−→hsp70Stat3κ6−→hsp70 S tat3+Socs3bκ7−→Socs3b S tat3κ8−→∅ S ocs3bκ9−→∅ h sp70κ10−−→∅. (12)

In the above network, we are especially interested in the possible activation of the heat shock response via Stat3. The detailed analysis via ASM based on all 8 trajectories of the experiment was presented in Linder:2013aa , where the topology of the conic (i.e., single-source) sub-network in Figure 2 was learned. We may thus compare the proposed synthetic method’s results based on LSE with the results based on ASM for the same dataset. As previously mentioned, the proposed method also allows for computation of posterior probabilities via empricial estimates of the posterior weights; i.e., since the dominating measure is and not merely . To this end, we compute and report the posterior probabilities by simulating 50,000 MCMC samples from the model in (3.2) under the same hyperparameter assumptions, as in the previous section.

The results in Table 5 indicate that reactions 4, 5, and 6 are likely true, while reaction 8 may only occur on a much longer time scale. Since both methods produce similar network topologies, we mention some advantages of the proposed model over ASM. While the appeal of the ASM is that it exploits the geometry of the stoichiometric matrix, the proposed method based on synthetic likelihood does so as well, in a sense, through the entries of matrix. A practical limitation of the ASM is that it enforces the cone-wise assumption that exactly reactions are true, which will typically not be the case. Similarly to the synthetic method, ASM also tacitly assumes a large volume setting, , however, unlike for the synthetic method, the ASM inference problem is only asymptotically well-posed and only on the set of posterior probabilities for . Thus ASM is strictly a topology learning routine, and not capable of kinetic parameter estimation. In contrast, (even though we did not present the results in this section for brevity) the parameter estimation may be easily carried out with the proposed synthetic approach by analyzing the posterior distribution and selecting the point estimates of , as was done in the previous section. For illustration, we present the bivariate contour plots of the posterior distribution for the reaction rates from the sub-system of interest in Figure 3. Our main observation is that the empirical plots indeed agree with our theoretical results on the unimodality of the posterior distribution.

### 5.2 The Plague at Eyam

In the seventeeth century, following the Great Plague of London the village of Eyam, Derbyshire, England, experienced an outbreak of plague, caused by the bacterium Yersinia pestis. In this section we analyze data from this outbreak that occurred at Eyam in 1665-1666. See also whittles2016epidemiological ; dean2018human for further discussion of the dataset and the relevant historic context.

Several features about the plague outbreak at Eyam make its study somewhat unique. The first of these features is that the village rector, a Reverand William Mompesson, reportededly convinced the villagers to self-quarantine. Although recent evidence suggests that a few of the wealthy residents may have fled (it is reported that Mompesson sent his children away before the quarantine), we may effectively treat the plague at Eyam as an outbreak in closed population. The names and burial dates of plague victims were recorded by Mompesson. Further, the parish records combined with the hearth tax record for Eyam in 1664 provide detailed information on the villagers, such as their sex, approximate date of birth, date of burial, and household information. This curated version of the Eyam parish register has lead to a newly revised estimate of a total village population of around 700, from an initially reported 350.

As the account goes, a tailor at Eyam received a shipment of cloth from London that was carrying plague infected rat fleas, and the first infected victim is believed to have come in contact with this cloth. As the infected flea’s digestive system becomes blocked by the bacterium, the flea vomits into the bite wound, thus transmitting Y. pestis. This transmission mechanism is now medically confirmed as giving rise to the bubonic form of plague. On September 7th 1665 the first burial due to plague was for a George Viccars. Over the next nine months, a somewhat constrained outbreak occurred in the Eyam villagers, of which 77 deaths have been attributed to plague. Around mid-May 1666 a second wave of the outbreak began to spread, and during the ensuing months from June 1666 through October 1666 had decimated the village, killing some 257 villagers.

While the rodent-to-human transmission route via the rat flea is understood to be critical for the initial outbreak dynamics, this particular mode of transmission alone does not fully explain the observed rapidity of the various plague outbreaks throughout Europe. This was also argued, at least qualitatively, based on the empirical differences in the early outbreak dynamics compared to the latter months at Eyam, raggett1982stochastic . An apparent lack of recorded rat falls (large scale rat deaths) during these outbreaks provides further evidence that additional transmission mechanisms were also critical for disease spread. Rat falls are generally considered necessary to cause sufficient flea jumpings from rat corpses onto humans. While human-to-human contact has been recognized as a component of the plague transmission process, through plague pneumonia and more recently via ectoparasites, such as lice and the human flea, recent analyses suggest that this transmission route may be far more important than previously recognized (whittles2016epidemiological, ; dean2018human, ).

We set out to analyze the Eyam plague data that was reported in raggett1982stochastic , which we have augmented to account for the more recent information on the total population reported in whittles2016epidemiological .

Figure 4 illustrates the compartmental SIR model that we consider for analysis of the Eyam data. The , , and compartments represents the number of susceptibles, infectious, and removed individuals, which we denote at time by , , and . While we have labeled the compartment removed as is standard in the typical SIR notation, the Eyam plague was almost universally fatal for infected. There were only three alleged recoveries, of which none were reported in the data sources, so that effectively represents deaths. We represent the rodent-flea compartments and its contribution to the infectious pressure on susceptibles as . This is essentially an assumption that the infectious pressure from non-human interaction is proportional to the number of susceptibles. While this may not be a completely accurate description, the outbreak period that we analyze was during early to late summer, leading one to suspect that the infectious pressure from the compartment may have been approximately constant. According to the SIR model, susceptibles () make infectious contact and then transition to compartment at rate . Finally, infected individuals die and transition to compartment at rate .

The compartment specific prevalence estimates, reported in raggett1982stochastic , were updated with the new population total and are displayed in Table 6. We have focused attention on the second phase of the outbreak that occurred in the summer of 1666 to assess the evidence on whether a particular transmission route was more important than another. Since the Eyam data analysis assumes a closed population one may readily recover the compartment at time by the formula , with .

The corresponding mass action ODE then for the above system has the form

 dSdt= −(κ1I+κ3)S dIdt= −(κ2−κ1S)I (13)

We use the plague data in Table 6 to compute the MEF statistic using the approach described above. The MEF statistic is computed by minimizing the weighted sum of squared distances between the observed trajectory and ODE solution of (5.2), weighted by the asymptotic process covariance at each timepoint. The initial condition is given as the compartment specific prevalence estimates in May 18 1666. For the Eyam data we have the single outbreak trajectory; i.e., no replicates, and we thus use the asymptotic covariance estimate of the MEF statistic for the fixed covariance term to construct the synthetic likelihood. Also,

, indicating that the unknown rate parameters are directly related to the summary statistics through the identity matrix. We collected 50,000 MCMC samples via the synthetic likelihood method described above with a burn-in of 5,000. The corresponding posterior medians and 95% credible intervals for the parameters are

, , and .

We note that the results from our analysis agree qualitatively with the results in whittles2016epidemiological ; dean2018human concerning the role of human-to-human transmission of plague. While we have not made explicit assumptions about what the exact form of the human-to-human transmission mechanism (i.e, plague pneumonia, ectoparasites or some other form) the data from the latter months of the outbreak at Eyam nonetheless suggest that in our simple modified SIR model human-to-human contact was important. This was already suggested early on from the historical accounts that the plague at Eyam could be transmitted from the cough of a plague victim, suggesting plague pneumonia transmission. Further, the posterior median for of zero indicates the corresponding link in our modified SIR model, that accounts for infectious pressure from the rodent-flea route, could be negligible, at least during the latter part of the outbreak.

There are several limitations of this analysis that should be noted here. The first is that we have restricted our attention to the latter months of the outbreak at Eyam, during which it was apparent that the dynamics had changed from those of the initial outbreak. By doing this we are potentially missing information about the nature of the initial dynamics, which may point to a different transmission route as being important early on. Indeed, results from whittles2016epidemiological suggest that approximately 27% of infections were caused by rodents and 73% from human-to-human transmission by using the full outbreak data. This leads to another limitation, in that we have relied on the prevalence estimates reported in raggett1982stochastic , updated with new population totals. These compartment prevalences were estimated from the historical and death records, so are likely subject to measurement error, which we have not accounted for. Further, we have not used data on household structure and composition, although this is part of planned future work. Finally, while the augmented SIR type model we have used is somewhat similar to the SEIR model used in whittles2016epidemiological , it does not consider explicit plague pneumonia vs. ectoparasite driven human-to-human transmission separately, as was done in dean2018human . Hence, our analysis only adds to the evidence that some form of human-to-human transmission, which we modeled with a generic SIR framework, was important but does not distinguish between particular forms of this transmission.

## 6 Conclusions

We have described a method which can be used to perform estimation of biochemical networks as often considered in the context of dynamic gene regulatory networks and stochastic epidemic models. It is well established that this is a notoriously difficult problem, due to the intractability of the likelihood under partially observed trajectories. The underlying theme in most of the popular approaches in this area is to use likelihood approximations to perform approximate inference, such as in golightly2005bayesian ; girolami2008bayesian ; komorowski2009bayesian ; golightly2012 ; finkenstadt2013quantifying ; fearnhead2014inference . While our approach adopts this theme, it is fundamentally different than the standard approximate and likelihood-free inferential techniques. The most important of these differences is that the data summary statistics used here (LSE or MEF estimates) have properties that are well understood and are directly related to the unknown kinetic parameters. These properties justify, via the asymptotic normality, a parametric form for the synthetic likelihood, in the spirit of Wood:2010aa , that is linear in the parameters of interest. Hence, we have demonstrated that projections of the species’ trajectories into sets of dynamically informative statistics allows for highly efficient posterior sampling and a procedure that should scale well in large systems.

## Acknowledgements

The first author would like to thank the Mathematical Biosciences Institute (MBI) at Ohio State University, for partially supporting this research through an Early Career Award. MBI receives its funding through the National Science Foundation grant DMS 1440386.

## Appendix A Proofs of Propositions

###### Proposition 1.

The statistics, , of Equations 4 and 5 computed from the trajectory of data arising from the DDMJP in Equation 1 are asymptotically sufficient for as .

###### Proof.

Let be the trajectory arising from the DDMJP in Equation 1. Assume that , and hence , is the true parameter and let be a solution to in Equation 5, which necessarily satisfies

 0= ∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(Cn(tij)−c^β(tij)) = ∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(Cn(tij)−cβ(tij)+cβ(tij)−c^β(tij)) = ∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(Cn(tij)−cβ(tij)) +∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(cβ(tij)−c^β(tij))

implying that

 ∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(c^β(tij)−cβ(tij))=∑tij∂c^β(tij)(Var^β(Cn(tij)))−1(Cn(tij)−cβ(tij)) (14)

Taylor’s expansion of the left hand side of Equation (14) about and scaling by gives

 √n(^β−Qκ)=(B^β)−1∑tij∂c^β(tij)(Var^β(Cn(tij)))−1√n(Cn(tij)−cβ(tij))+oP(1) (15)

where and the higher order terms in the expansion vanish since is consistent for , under the regularity conditions in Remp12 . Consistency and the asymptotic normality of , imply that , where is the limiting covariance, see Linder:2013aa ; Linder:2015aa . ∎

###### Proposition 2.

The point mass mixture prior, , in Equation 9 with
and is logarithmically concave in , and hence is unimodal.

###### Proof.

We prove this result component-wise and the result for the full vector follows. The exponential density, belongs to the class of log-concave densities; i.e., for any we have for . Thus, if and are both positive we have

 π1(αx+(1−α)y)=ωf(αx+(1−α)y) ≥ωf(x)αf(y)1−α =(ωf(x))α(ωf(y))1−α =π1(x)απ1(y)1−α.

When both and are zero we have . When only one is zero, say , then

 π(αx+(1−α)y)=ωf(αx+(1−α)y) ≥ωf(x)αf(0)1−α =(ωf(x))α(ωf(0))1−α =π(x)α(ω1−ωω)1−α =π(x)απ(0)1−α.

Thus, the prior is logarithmically concave in and hence is also unimodal. ∎

###### Proposition 3.

If and , the marginal synthetic posterior distribution from the synthetic likelihood model, , is unimodal.

###### Proof.

The synthetic posterior is proportional to

 π(κ,Σ|D) ∝SL(κ,Σ|D)π1(κ|λ)π2(Σ|Ψ) ∝|Σ|(v−N−d−1)/2exp{−12tr(Ψ−1Σ)}exp