Improved estimations of stochastic chemical kinetics by finite state expansion

06/12/2020 ∙ by Tabea Waizmann, et al. ∙ IMT School for Advanced Studies Lucca 0

Quantitative mechanistic models based on reaction networks with stochastic chemical kinetics can help elucidate fundamental biological process where random fluctuations are relevant, such as in single cells. The dynamics of such models is described by the master equation, which provides the time course evolution of the probability distribution across the discrete state space consisting of vectors of population levels of the interacting biochemical species. Since solving the master equation exactly is very difficult in general due to the combinatorial explosion of the state space size, several analytical approximations have been proposed. The deterministic rate equation (DRE) offers a macroscopic view of the system by means of a system of differential equations that estimate the average populations for each species, but it may be inaccurate in the case of nonlinear interactions such as in mass-action kinetics. Here we propose finite state expansion (FSE), an analytical method that mediates between the microscopic and the macroscopic interpretations of a chemical reaction network by coupling the master equation dynamics of a chosen subset of the discrete state space with the population dynamics of the DRE. This is done via an algorithmic translation of a chemical reaction network into a target expanded one where each discrete state is represented as a further distinct chemical species. The translation produces a network with stochastically equivalent dynamics, but the DRE of the expanded network can be interpreted as a correction to the original ones. Through a publicly available software implementation of FSE, we demonstrate its effectiveness in models from systems biology which challenge state-of-the-art techniques due to the presence of intrinsic noise, multi-scale population dynamics, and multi-stability.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Quantitative mechanistic models based on reaction networks with stochastic chemical kinetics can help elucidate fundamental biological process where random fluctuations are relevant, such as in single cells. The dynamics of such models is described by the master equation, which provides the time course evolution of the probability distribution across the discrete state space consisting of vectors of population levels of the interacting biochemical species. Since solving the master equation exactly is very difficult in general due to the combinatorial explosion of the state space size, several analytical approximations have been proposed. The deterministic rate equation (DRE) offers a macroscopic view of the system by means of a system of differential equations that estimate the average populations for each species, but it may be inaccurate in the case of nonlinear interactions such as in mass-action kinetics. Here we propose finite state expansion (FSE), an analytical method that mediates between the microscopic and the macroscopic interpretations of a chemical reaction network by coupling the master equation dynamics of a chosen subset of the discrete state space with the population dynamics of the DRE. This is done via an algorithmic translation of a chemical reaction network into a target expanded one where each discrete state is represented as a further distinct chemical species. The translation produces a network with stochastically equivalent dynamics, but the DRE of the expanded network can be interpreted as a correction to the original ones. Through a publicly available software implementation of FSE, we demonstrate its effectiveness in models from systems biology which challenge state-of-the-art techniques due to the presence of intrinsic noise, multi-scale population dynamics, and multi-stability.

Author summary

Many biological systems exhibit random fluctuations which are of fundamental importance, for example, at the level of single cells. The elucidation of such behavior can be assisted by quantitative mechanistic models that are typically described by reaction networks with stochastic kinetics, yielding a microscopic description that tracks discrete changes in the populations of the interacting biochemical species. In many circumstances, however, exact analytical solutions of these models are not available, and simulation methods can be too demanding computationally. On the other hand, macroscopic descriptions that are based on deterministic approximations can yield inaccurate estimates because they fail to account for the inherent stochasticity in the system. In this article we present finite state expansion, a method that interpolates between the microscopic and macroscopic views by explicitly tracking a subset of the original discrete configurations and consistently coupling their dynamics with deterministic variables. Using a software implementation of our method on models from the literature that challenge other state-of-the-art approximation techniques, we show that finite state expansion improves deterministic estimates when the stochasticity of the system cannot be neglected.

Introduction

Chemical reaction networks are a fundamental model to analyze species that interact stochastically through reaction channels according to dynamics governed by the well-known master equation [42]. This provides a microscopic description in terms of a set of coupled linear differential equations, each defining the time course of a discrete state of the system as a vector of population counts of the chemical species involved. It is widely understood, however, that the analysis of the master equation is intractable in general, since analytical solutions are available only in special cases and direct numerical integration is hindered by the combinatorial growth of the state space as a function of the abundances of the species. Networks with large numbers of species and reactions, and the correspondingly huge state spaces that they typically subsume, also have a considerable impact on the computational cost of stochastic simulation methods [11]. In addition, forgoing an analytical treatment in favor of simulation may preclude other important studies such as stability, perturbation analysis, bifurcation, and parameter inference [32, 8]. In all these cases it is useful to consider analytical approximations of the master equation that trade off precision with cost.

The deterministic rate equation (DRE) provides a macroscopic dynamical view of a chemical reaction network by associating one ordinary differential equation with each species. The DRE solution gives the exact mean population levels as a function of time if each reaction’s propensity function is linear, as occurs, for instance, in monomolecular chemical reaction networks 

[11]. With nonlinear propensity functions, under mild conditions the DRE does give the true expectations only in the thermodynamic limit [23]. Away from this asymptotic regime, nonlinear propensities lead to DREs that provide only an approximation to the true mean dynamics. This is the case, for example, in models of cell regulation which depend on low-abundance species (in the order of a few units) to describe the behavior of genes [14]. Processes such as activation and deactivation that vary with time as a result of various interactions may introduce significant variability in gene expression [7], caused by inherent stochasticity in the bio-molecular processes involved [29, 38]. Since such forms of noise are not accounted for in the DRE, approximation errors may be large.

Here we present finite state expansion (FSE), a method which offers a mediation between the discrete and continuous representations of the master equation and the DRE, respectively. The former can be interpreted as corresponding to a situation where every discrete state is tracked; the latter, on the other hand, corresponds to the situation where no discreteness is kept and the chemical species are observed only through their approximate average populations. In essence, FSE bridges these two descriptions by keeping track discretely of only a user-defined subset of the state space, while collapsing the rest as a continuous approximation. In particular, the state space to be kept discretely is determined by a parameter which specifies the maximum population level for each species to be tracked explicitly.

FSE is realized by means of a systematic translation of a chemical reaction network (with arbitrary propensity functions) into an expanded one which features additional species and modified reactions. Specifically, each tracked discrete state is represented as a new auxiliary species; the original set of reactions is transformed such that the dynamics of the auxiliary species are coupled with those of the original species, whose role is to buffer the probability mass that falls out of the state space that is tracked. In this respect, FSE can be seen as a mass-preserving variation to the well-known finite state projection method, which truncates the state space [28].

FSE enjoys two useful properties. The first concerns its soundness

, in the sense that any expanded chemical reaction network is stochastically equivalent to the original one. In other words, their master equations can be put in exact correspondence with each other. This is formally done by proving a result of aggregation for Markov chains known as ordinary lumpability 

[4]. It essentially states that the state space of the expanded network can be projected onto a lower-dimensional one which still satisfies the Markov property and which turns out to correspond to the original network. Importantly, such exact correspondence does not carry over to the respective DREs of the original and the expanded networks. Indeed, any expansion arising from a strict subset of the discrete state space will lead to a DRE with more equations, which can be interpreted as refining terms for the mean estimates. We demonstrate this experimentally with a number of case studies taken from the systems biology literature. Our second theoretical contribution is a result of asymptotic correctness, stating that if every discrete state is tracked then the DRE of the expanded network corresponds to the master equation.

There are several approaches for improving the accuracy of the DRE. These include moment-closure approximations 

[22], the effective mesoscopic rate equation, which adds correction terms to van Kampen’s well-known system size expansion [12, 39], and hybrid techniques [15]; ref. [32] offers an up-to-date review. However, these methods are applicable under certain assumptions such as smoothness of the propensity functions [1, 12, 2], mass-action kinetics [25, 37, 10, 35, 36], specific structure of the chemical reaction network, e.g., to describe gene regulatory systems [40], and species that can be partitioned into low- and high-abundance classes [18, 27, 15]. FSE, instead, can be applied to any chemical reaction network in principle. Additionally, the case studies presented in this paper were chosen as representative instances that may challenge the quality of the approximations by state-of-the-art methods. With these models we show that FSE improves mean estimates when implementations of moment-closure approximations give unphysical results or when the hybrid method from ref. [15] experiences numerical difficulties. Otherwise, FSE may outperform the effective mesoscopic rate equation, and provide more accurate approximations than finite state projection when both methods track the same subset of the discrete state space.

Results

In this section we use a number of case studies from the literature to show that FSE can refine the accuracy of the approximation of mean estimates even with modest expansions. Analytical solutions of the master equations for the chosen models are not known, and numerical solutions are difficult because the models give rise to Markov chains with infinite state spaces. For these reasons, we considered ground-truth mean trajectories computed by stochastic simulation via Gillespie’s algorithm [11]. The numerical experiments herein reported were performed with an implementation of FSE publicly available with the software tool ERODE [6].

Schlögl model

The well-known Schlögl system is an autocatalytic process for a single species  [30]. The DRE of the original Schlögl model has two equilibrium points, owing to its strong (cubic) nonlinearity [3], deterministically converging only to one [43]. Its discrepancy with respect to the average mean trajectory computed by stochastic simulation has been observed for a long time [45]. Fig 1 provides a fully worked application of our FSE as a function of the upper bound for species , denoted by , which determines the largest population level to be tracked explicitly in the expansion. The solutions to the DRE of the expanded networks show that larger values of such upper bound increasingly improve the accuracy of the mean estimates.

Fig 1: The FSE method applied to the Schlögl system [30]. (A) Mass-action reactions with kinetic parameters taken from ref. [26]: , , , . (B) Stochastic simulations show the well-known bimodality of the steady-state probability distribution of species . (C) For a given upper bound on the population of to be tracked explicitly, FSE yields the auxiliary species denoted by , , …, . The original species acts as buffer which collects untracked populations levels. For example, reaction R1.1 derives from reaction R1 when the autocatalytic formation of a new molecule occurs when the system tracks the discrete state , thus requiring to increase the buffer species by one element. Even when the system tracks a discrete state which does not require buffering (R1.2), the propensity function of the reaction effectively considers an overall kinetics of mass-action type, since the factor models the total rate due to number of possible collisions between pairs of indistinguishable molecules. Intuitively, the factor conditions these events to the system tracking discrete molecules. (D) The original state space counts the number of copies of . The state space in the expanded network consists of the pair tracked discrete state/population level of the buffer species. Here, our result of soundness (see Methods) states that the sum of the probabilities across all pairs that have the same overall population matches the corresponding probability in the original Markov chain (as exemplified by matching colors of the states). (E) The single-dimensional DRE of the original Schlögl model is expanded into a DRE with variables (where denotes the indicator function); an estimate of the total mean population at time can be computed as . F) Starting from a population of 200 elements of , the original bi-stable DRE converges to one equilibrium at ca. 85.50 (blue line). FSE achieves excellent agreement with an upper bound (with respect to the average computed by stochastic simulation with 100000 repetitions).

Genetic feedback switch

Let us now consider a model for a genetic feedback switch taken from Refs. [17, 13]:

(1)

Species and represent the state of a single gene when its promoter region is unbound (respectively, bound) to a protein . The reaction propensities obey the law of mass action. This is a basic model for negative autoregulation, a well-known motif appearing in more than 40% of the known transcription factors in E.coli [34]. Here, a natural choice of upper bounds for the gene species is , by which the DRE of the expanded network can be interpreted as the solution of the conditional expectation of the protein population based on the gene state. Small values of yield a significant correction of the protein levels as well as of the marginal probability distribution of the gene state (Fig 2).

Fig 2: Genetic feedback switch in Eq 1. Numerical simulations comparing stochastic simulation (1E+06 repetitions), DRE and FSEs for fixed and different upper bounds . The resulting DRE from FSE has equations. Kinetic parameters were set as follows: , , , , , . The initial state is .

Genetic toggle switch

The toggle switch network is a fundamental regulatory system of two mutually repressing genes [9]. Its mathematical modeling is challenging because of multimodality [41, 40], as well as stochastic noise due to the species such as mRNA present in low molecular abundances [19]. Here we study the reaction scheme analyzed in ref. [16], consisting of a mass-action variant presented in ref. [9]:

(2)

where and denote the precursor mRNA and the mRNA for target protein . The last two reactions model mutual inhibition by means of a precursor of one protein repressing the mRNA of the other.

When protein production is controlled by low populations of precursor mRNA, the stochastic fluctuations are not adequately approximated with DRE. By explicitly observing few copies of mRNA (up to tens) our method provides precise estimates of the time courses of the mean populations (Fig 3). The resulting equations, of size at most 2310 in our tests, can be analyzed effectively, as opposed to time-consuming stochastic simulations using hybrid approaches such as those reported in ref. [16].

Fig 3: Toggle switch network in Eq 2. Numerical simulations comparing stochastic simulation (500000 repetitions), DRE and FSE by fixing while using different upper bounds and ( in short) for the number of copies of / and / (as indicated in the legend), respectively. Initial condition was the zero state. The size of the DRE for the tested choices of upper bounds is equal to (corresponding to 150, 1095 and 2310 equations for , , and , respectively). Kinetic parameters were chosen as follows: , , , , , , . Protein production (right plot) is controlled by a low population of precursor mRNA (left plot), which causes significant underestimation errors with DRE. Increasing the upper bounds of FSE improves the accuracy of the mean estimate. Corrections for species and , not reported here, are similar.

Comparison with related work

Using the models presented in the previous section, here we compare FSE with state-of-the-art techniques which can be used to obtain approximate estimates of mean population levels in stochastic reaction networks. Specifically, we considered the following methods:

  • Moment-closure approximation (MCA). We considered the second-order low-dispersion moment closure [20, 33]

    , in which variance and covariance are the highest observed moments and all higher-order central moments are set to zero; in all models considered in this paper, computing approximations with higher-order moments did not improve the quality of the approximation.

  • The effective mesoscopic rate equation (EMRE), which adds mean-correction terms to the linear-noise approximation under the assumption of an underlying Gaussian process [12].

  • The method of conditional moments (MCM), a hybrid analytical technique combining a discrete representation of low-abundance species and a moment-based approximation of high-abundance ones [15].

  • Finite state projection (FSP), which truncates the state space of a Markov chain by redirecting transitions toward unobserved states into an absorbing state with provable bounds [28].

For this study, we used an implementation of the techniques as available on the software tool CERENA [20].

The Schlögl model is known to stress MCA because of their reported difficulties with multimodal distributions [24, 31]. Fig 4 shows that MCA behaves similarly to DRE in this case, while EMRE tends to overestimate the mean population of species at longer time horizons. Similar results were obtained on the toggle switch network (Fig 5). Here we additionally confirm physically meaningless moment-closure estimates due to the presence of low-abundance species, as already reported in Ref. [35].

Fig 4: Comparison with related techniques on the Schlögl model. The average population of computed by stochastic simulation (100000 repetitions) is compared against DRE, MCA, EMRE and FSE with .
Fig 5: Comparison with related techniques on the toggle switch network. Stochastic simulation to compute the average populations of species (500000 repetitions) is compared against DRE, MCA, EMRE, and FSE. FSE is run with upper bounds , and . MCA estimates population levels approaching 75000 (out of scale in this plot to improve readability) before dropping to zero.

In the genetic feedback switch model, species and describe the distinct binary states of a single gene. Hence they represent the natural candidates of the low-abundance class when applying MCM. On this model, however, the method could not return valid results as early as time point . We further tested a gene regulatory model with an inhibition feedback loop taken from Ref. [44], which however showed similar difficulties, thus confirming already reported numerical issues. [32] (The numerical results of this analysis, not reported here, are replicable using the supporting data of this article.)

Fig 6: Analysis by FSP of the Schlögl model. With parameter settings as in Fig 1, for state spaces of equal size as in Fig 1F (450 and 650), FSE estimates the average population of more precisely than finite state projection (stochastic simulation performed with 100000 repetitions). In particular, while FSE requires requires a state space with 650 states to accurately match stochastic simulations, FSP needs 750 states.

Defining incoming and outgoing transitions with respect to the buffer species maintained in the expanded network represents a crucial difference with FSP, where transitions toward unobserved state are collapsed into a sink state that absorbs the probability mass. Experimentally, this results in increased accuracy of mean estimates by FSE when tracking the same subset of the state space in both methods (Fig 6).

The solution by FSP is a lower bound on the true probability distribution, and increasing the set of observed states tightens that bound. Instead, although FSE ensures that the expansion coincides with the master equation when the whole state space is tracked, it does not give theoretical guarantees on the degree of accuracy, nor does it guarantee monotonically increasing accuracy with larger observation bounds. Indeed, experimentally we confirmed that monotonicity of the error is model dependent. For instance, the relative percentage error between the mean population predicted by FSE and the estimated mean by stochastic simulation is not monotonic in the Schlögl system (Fig 7), while it is monotonic in the case of the genetic feedback switch and toggle switch models (Figs. 8-9).

Fig 7: Sensitivity analysis of FSE in the Schlögl model. Using parameters as in Fig 1, the relative percentage error on the estimate of the mean population of computed by stochastic simulation (100000 repetitions) shows non-monotonic behavior of the accuracy of FSE with increasing observation bound .
Fig 8: Sensitivity analysis of FSE in the genetic feedback switch model. (Top) Monotonic behavior of the time-dependent accuracy of FSE against stochastic simulation (1E+06 repetitions) for increasing observation bounds of , fixing . (Bottom) Error behavior in the steady state (estimated at time point ) shows a significant impact of explicitly tracking the discrete states , of the gene.
Fig 9: Sensitivity analysis of FSE in the genetic toggle switch model. Monotonic behavior of the accuracy of FSE against stochastic simulation (500000 repetitions) for increasing observation bounds of and at time point , representative of steady-state, fixing .

Methods

Preliminaries

In order to introduce the FSE method, we first require some preliminary mathematical theory and notation. Consider a set of species . Then, and are the sets of all integer and real-valued vectors, respectively, with coordinates represented by the elements in . For a given vector (or ), we denote by the value of the component corresponding to species . We define the following operations for any two .

Minimum

is such that for all .

Saturated subtraction

is such that for all .

Projection

Given , is such that for all .

Mapping

Given , and a function , is such that , for all .

We generalize binary operations to the case where operands and are such that and , with : each binary operation treats them as elements of .

Formally we denote a reaction network as a pair , where is a set of reactions. Each reaction is provided as a triple denoted by

(3)

where are the reactants, are the products, and is the propensity function, , with arbitrary form. We will use the standard notation for reactants and products whereby only the nonzero components are written out, separated by the plus sign. For instance, given the species , the reaction

(4)

corresponds to Eq 3 with , , and , .

A discrete state of a reaction network is described by a vector , where denotes the population of species in that state. Then,

is a non-negative real which gives the parameter of the exponential distribution that governs the firing time of that reaction. Upon firing, the system may transition from state

to , thus defining a stochastic dynamics in terms of a Markov jump process.

Stochastically, the behavior of a reaction network is defined by the (chemical) master equation. It gives the probability of finding the Markov chain in state at time :

It is worth remarking that the master equation is defined for all . However, it solution will be nonzero only for those states that are reachable from the states that have nonzero probability at time . The reachable set of states, also called the state space, can be defined as the smallest set such that the following hold:

  1. is in the reachable set if ;

  2. is in the reachable set if is in the reachable set and there exists a reaction such that .

For simplicity (and without loss of generality) we can consider networks where the initial probability distribution is concentrated in one state only, which is called the initial state. Additionally we shall restrict to well-defined reaction networks where each propensity function evaluates to zero for all multisets that do not have the minimum population counts described by the reactants. Formally, a reaction network is well-defined if every reaction is such that if (the inequalities shall be intended component-wise from now on). This guarantees that the Markov chain does not reach states with negative population counts.

The state space, hence the number of equations required for stochastic analysis, may be finite or infinite depending on the network stoichiometries. Even in the case of finite state spaces, its size may grow combinatorially large with the population counts of the initial state. This may practically preclude exact analysis in most models of interest. The DRE provides a compact model with variables. Each variable approximates the expected population level of each species at time , denoted by the vector , as the solution of the following system of differential equations:

Notice that the true expected population counts, denoted by , are known to satisfy the system

which is not self-consistent because there are no equations for the expected values of the propensity functions appearing in the right-hand sides. Essentially, the DRE closes the true equations for the expected values by replacing with , introducing an approximation error if the propensity functions are not linear. Such error is known to vanish asymptotically when the initial population levels go to infinity and the DRE is understood as a system of re-scaled equations for the concentrations of species, rather than absolute population counts [23].

Finite state expansion

The FSE approach rests on the idea of obtaining DRE-type equations that mediate between the discreteness of the state space in the master equation and the continuous interpretation of the DRE, through an expansion of the reaction network where these two representations can be seen as “limit cases”. Specifically, given a reaction network, the original set of species is meant to represent the continuous dynamics. This is expanded with a set of auxiliary species, each representing a specific discrete state of the reaction network. The reaction set is then modified by replacing each original reaction with a set of reactions that account for the interaction between the continuous and the discrete parts. The core idea of FSE is that the standard DRE for the expanded system carry additional information that enables a more precise estimate of the mean. In the following, we discuss how to formally extend the reaction network and some consistency results of the transformation. All proofs are reported in S1 Appendix..

The auxiliary set of species is defined by the user through a vector of parameters that stipulate an upper bound to the population count to be tracked discretely for each species. Thus, in effect FSE yields a lattice of expansions depending on the choice of the upper bounds. Let us denote by the upper bound, where each component gives the maximum abundance of the species that is to be tracked in the expansion. For each discrete state , we denote by the corresponding auxiliary species that is considered in the expansion. Thus we may define to be the set of species in the expanded network as

For example, in a network with the single reaction as in Eq 4, one may choose . Then, the expanded network will have auxiliary species , , , , , , , and , where the last species denotes the zero vector being tracked. We remark that, similarly to the definition of the master equation, it is convenient to consider all states within the upper bound when describing the theory. However, also in this case, not all discrete states may be reached depending on the stoichiometry of the reaction network, hence they can be removed in practice during the analysis.

The expanded set of reactions is built by replacing each reaction in the original network with a set of reactions for each tracked state as follows:

(5)

where , and are defined componentwise by

(6)

Intuitively, for each original reaction as in Eq 3, Eq 5 conditions its dynamics with respect to being the discrete state being tracked. Any expanded reaction maintains the same overall counts of reactants and products as the originating reaction, with a product tracked state that results from the addition of products and removal of reactants within the upper bound ; and are vectors of original species of . Essentially, they act as buffer species for populations that are not explicitly tracked. The propensity function is derived from the original one as

(7)

This modification accounts for the fact that the tracked species encodes additional population counts, as given by .

For example, let us consider an expansion for the reaction in Eq 4 assuming that it evolves with mass-action kinetics. In general, for a reaction with reagents and kinetic parameter , the propensity function by mass-action kinetics for state is given by . Here the propensity function reads:

Assuming that the upper bounds are , the expansion for the tracked state is given by

Since the tracked state does not have enough copies of , one further copy is used by the buffer species. The product of this reaction does not involve any of the buffer species because is within the chosen bounds. By Eq 7, the propensity function in the expanded reaction becomes

We denote by the set of reactions in the expanded networks where the transformation in Eq 5 is applied to every reaction in .

Every expansion is stochastically equivalent to the original network, in the sense that there is a unique marginal probability distribution for the overall population of each species at every time point. This equivalence can be stated in the sense of ordinary lumpability for Markov chains [4]. Using the master equation, we show that the probability of being in a state in the original reaction network equals the sum of the probabilities across all states in the expanded network with the same overall abundances for each species. This relation holds at all time points, provided that it is satisfied for the respective probability distributions at time 0.

Theorem 1.

Let and denote the solutions of the master equation in the original and expanded network, respectively. Then it holds that

for all .

The main idea behind the proof is to show the following equivalence of the derivatives of the master equation:

The previous result says that when is finite any expansion is stochastically equivalent to the original reaction network. By construction, if then the original and expanded networks coincide. Now we consider the other limit case, namely when the auxiliary set of species contains all discrete states, corresponding to a fully expanded reaction network. In this case, the DRE of the expanded network corresponds to the master equation of the original network, hence no approximation occurs. In order to do so, we state two preliminary results.

Lemma 2.

The expansion of a well-defined reaction network is well-defined.

The following statement proves that the expansion preserves the overall population jumps. That is, for each original reaction, every expanded reaction is such that each species is subject to the same change of its abundance level.

Lemma 3.

Let be a reaction and its expansion according to Eq 5. Then it holds that:

  1. ;

  2. ;

  3. , for all such that .

With these two lemmata, we are now ready to state and prove our main result of asymptotic correctness for a fully expanded reaction network.

Theorem 4.

Consider a well-defined reaction network and let be its expansion where

Let be the DRE solution of the expanded network and the solution of the master equation of the original network at time . Then it holds that:

  1. if then for all and ;

  2. if then , for all and .

The proof of this theorem is based on showing that the right hand side of the DRE associated with a fully expanded reaction network coincides with the master equation, based on the fact that the DRE for an expanded network reads

(8)

Conclusion

We have presented finite state expansion (FSE) as a novel analytical method that offers a trade off between the exactness of the solution of the master equation and the approximation errors introduced by the deterministic rate equation (DRE) for chemical reaction networks. FSE maintains a user-defined subset of the discrete state space and couples this with whole-population continuous dynamics to account for the behavior of states that are not explicitly tracked. By an algorithmic translation of a chemical reaction network into an expanded one with auxiliary species and modified reactions, FSE leads to equations that can be interpreted as a refinements of the original DRE. A theoretical result of asymptotic correctness increases the confidence as to the effectiveness of the method, since it shows that the DRE of the expanded network corresponds to the original master equation.

The performance of FSE in correcting the original DRE when tracking a strict subset of the discrete state space has been shown numerically in models that turn out to be challenging for related state-of-the-art techniques. The effective mesoscopic rate equation relies on perturbation arguments around the linear-noise approximation, hence it inherently assumes a limiting regime, unlike FSE. Experimentally, we found that this resulted in less accurate mean estimates than FSE. With respect to analytical approximations of the master equation based on moment closure, the case studies proved difficult since the analyses returned unphysical results or exhibited numerical issues, as also reported in the literature. The use of buffer species makes it possible for FSE to outperform finite state projection when using the same observation bounds for the tracked state space, because with FSE the probability mass does not absorb into a sink state. However, this inhibits the possibility to obtain error bounds between the FSE solution and the original master equation, unlike with finite state projection. Still, numerically we found excellent accuracy when the observed state space is large enough, both during the transient evolution and in the steady state. Overall, these findings make FSE a useful tool to study chemical reaction networks for which exact stochastic analysis through the master equation is not accessible.

Despite these encouraging results, the applicability of FSE may not always be feasible. Since it is based on an enumeration of the discrete state space—albeit up to the given observation bound—it too may suffer from combinatorial complexity, such that the number of equations can grow rapidly large. If significant probability mass falls outside the tracked state space, the performance of FSE may not be adequate, as the Schlögl system shows when small enough bounds are used.

There are a number of methods that are worth investigating in the future in order to tackle these challenges. Model-reduction techniques could help relieve the computational cost of the analysis of the DRE by providing a lower-order approximation that preserves the dynamics of interest [5, 21]. In principle, there might be other expansions than the one presented here, which give rise to different correction behavior of the DRE while still preserving the stochastic dynamics. A further line of improvement might consist in devising variants of FSE where the tracked state space can be arbitrarily fixed, instead of being dependent on an upper bound for the population counts. This might allow the fine-tuning of the choice of the discrete region where the probability mass is mostly concentrated. For such expansions, smaller observation bounds (hence lower computational cost) may suffice to obtain the same degree of accuracy as in this paper, thus potentially extending the practical applicability of FSE to models of higher complexity.

Supporting information

S1 Appendix.

Proofs of the results presented in the paper.

Acknowledgments

This work has been partially supported by the Italian Ministry for Education and Research under grant SEDUCE no. 2017TWRCNB and by the IMT School for Advanced Studies Lucca under grant PROCOPE.

References

  • [1] G. A. and V. C. (2007) Mass fluctuation kinetics: capturing stochastic effects in systems of chemical reactions through coupled mean-variance computations. The Journal of Chemical Physics 126 (2), pp. 024109. External Links: Document, Link Cited by: Introduction.
  • [2] Ale,Angelique, Kirk,Paul, and S. P. H. (2013) A general moment expansion method for stochastic kinetic models. The Journal of Chemical Physics 138 (17), pp. 174101. External Links: Document, Link Cited by: Introduction.
  • [3] L. M. Bishop and H. Qian (2010) Stochastic bistability and bifurcation in a mesoscopic signaling system with autocatalytic kinase. Biophysical Journal 98 (1), pp. 1–11. External Links: Document, ISSN 0006-3495, Link Cited by: Schlögl model.
  • [4] P. Buchholz (1994) Exact and Ordinary Lumpability in Finite Markov Chains. Journal of Applied Probability 31 (1), pp. 59–75. External Links: ISSN 00219002 Cited by: Introduction, Finite state expansion.
  • [5] L. Cardelli, M. Tribastone, M. Tschaikowski, and A. Vandin (2017) Maximal aggregation of polynomial dynamical systems. Proceedings of the National Academy of Sciences 114 (38), pp. 10029–10034. External Links: Document Cited by: Conclusion.
  • [6] L. Cardelli, M. Tribastone, A. Vandin, and M. Tschaikowski (2017) ERODE: a tool for the evaluation and reduction of ordinary differential equations. In Tools and Algorithms for the Construction and Analysis of Systems — 23rd International Conference, TACAS, External Links: Link Cited by: Results.
  • [7] M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain (2002) Stochastic gene expression in a single cell. Science 297 (5584), pp. 1183–1186. Cited by: Introduction.
  • [8] F. Fröhlich, P. Thomas, A. Kazeroonian, F. J. Theis, R. Grima, and J. Hasenauer (2016-07) Inference for stochastic chemical kinetics using moment equations and system size expansion. PLOS Computational Biology 12 (7), pp. 1–28. External Links: Document, Link Cited by: Introduction.
  • [9] T. S. Gardner, C. R. Cantor, and J. J. Collins (2000/01/20/print) Construction of a genetic toggle switch in escherichia coli. Nature 403 (6767), pp. 339–342. Note: 10.1038/35002131 External Links: ISBN 0028-0836, Link Cited by: Genetic toggle switch.
  • [10] C.S. Gillespie (2009-01) Moment-closure approximations for mass-action models. IET Systems Biology 3 (1), pp. 52–58. External Links: Document, ISSN 1751-8849 Cited by: Introduction.
  • [11] D.T. Gillespie (1977-12) Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 81 (25), pp. 2340–2361. Cited by: Introduction, Introduction, Results.
  • [12] R. Grima (2010) An effective rate equation approach to reaction kinetics in small volumes: theory and application to biochemical reactions in nonequilibrium steady-state conditions. The Journal of Chemical Physics 133 (3), pp. 035101. External Links: Document Cited by: Introduction, 2nd item.
  • [13] Grima,R., Schmidt,D. R., and Newman,T. J. (2012) Steady-state fluctuations of a genetic feedback loop: an exact solution. The Journal of Chemical Physics 137 (3), pp. 035104. External Links: Document, Link Cited by: Genetic feedback switch.
  • [14] P. Guptasarma (1995) Does replication-induced transcription regulate synthesis of the myriad low copy number proteins of escherichia coli?. BioEssays 17 (11), pp. 987–997. External Links: Document Cited by: Introduction.
  • [15] J. Hasenauer, V. Wolf, A. Kazeroonian, and F. J. Theis (2014-09-01) Method of conditional moments (MCM) for the chemical master equation. Journal of Mathematical Biology 69 (3), pp. 687–735. External Links: Document, ISSN 1432-1416, Link Cited by: Introduction, 3rd item.
  • [16] Hepp,Benjamin, Gupta,Ankit, and Khammash,Mustafa (2015) Adaptive hybrid simulations for multiscale stochastic reaction networks. The Journal of Chemical Physics 142 (3), pp. 034118. External Links: Document, Link Cited by: Genetic toggle switch, Genetic toggle switch.
  • [17] J. E. M. Hornos, D. Schultz, G. C. P. Innocentini, J. Wang, A. M. Walczak, J. N. Onuchic, and P. G. Wolynes (2005-11) Self-regulating gene: an exact solution. Phys. Rev. E 72, pp. 051907. External Links: Document, Link Cited by: Genetic feedback switch.
  • [18] T. Jahnke (2011) On reduced models for the chemical master equation. Multiscale Modeling & Simulation 9 (4), pp. 1646–1676. External Links: Document, Link Cited by: Introduction.
  • [19] M. Kærn, T. C. Elston, W. J. Blake, and J. J. Collins (2005/05/10/online) Stochasticity in gene expression: from theories to phenotypes. Nature Reviews Genetics 6, pp. 451 EP –. External Links: Link Cited by: Genetic toggle switch.
  • [20] A. Kazeroonian, F. Fröhlich, A. Raue, F. J. Theis, and J. Hasenauer (2016-01) CERENA: ChEmical REaction Network Analyzer—A Toolbox for the Simulation and Analysis of Stochastic Chemical Kinetics. PLOS ONE 11 (1), pp. 1–15. External Links: Document Cited by: 1st item, Comparison with related work.
  • [21] J. K. Kim and E. D. Sontag (2017-06) Reduction of multiscale stochastic biochemical reaction networks using exact moment derivation. PLOS Computational Biology 13 (6), pp. 1–24. External Links: Document, Link Cited by: Conclusion.
  • [22] C. Kuehn (2016) Moment closure—a brief review. In Control of Self-Organizing Nonlinear Systems, E. Schöll, S. H. L. Klapp, and P. Hövel (Eds.), pp. 253–271. External Links: ISBN 978-3-319-28028-8 Cited by: Introduction.
  • [23] T. G. Kurtz (1972) The relationship between stochastic and deterministic models for chemical reactions. The Journal of Chemical Physics 57 (7), pp. 2976–2978. Cited by: Introduction, Preliminaries.
  • [24] E. Lakatos, A. Ale, P. D. W. Kirk, and M. P. H. Stumpf (2015) Multivariate moment closure techniques for stochastic kinetic models. The Journal of Chemical Physics 143 (9), pp. 094107. External Links: Document Cited by: Comparison with related work.
  • [25] C. H. Lee, K. Kim, and P. Kim (2009) A moment closure method for stochastic reaction networks. The Journal of Chemical Physics 130 (13), pp. 134107. External Links: Document, Link Cited by: Introduction.
  • [26] H. Li, Y. Cao, L. R. Petzold, and D. T. Gillespie (2008) Algorithms and software for stochastic simulation of biochemical reacting systems. Biotechnology Progress 24 (1), pp. 56–61. External Links: Document, Link Cited by: Fig 1.
  • [27] S. Menz, J.C. Latorre, Ch. Schütte, and W. Huisinga (2012) Hybrid stochastic-deterministic solution of the chemical master equation. SIAM Interdisciplinary Journal Multiscale Modeling and Simulation 10 (4), pp. 1232–1262. Cited by: Introduction.
  • [28] B. Munsky and M. Khammash (2006) The finite state projection algorithm for the solution of the chemical master equation. The Journal of Chemical Physics 124 (4), pp. 044104. External Links: Document Cited by: Introduction, 4th item.
  • [29] J. Paulsson (2005) Models of stochastic gene expression. Physics of Life Reviews 2 (2), pp. 157–175. External Links: Document, ISSN 1571-0645, Link Cited by: Introduction.
  • [30] F. Schlögl (1972-04-01)

    Chemical reaction models for non-equilibrium phase transitions

    .
    Zeitschrift für Physik 253 (2), pp. 147–161. Cited by: Fig 1, Schlögl model.
  • [31] D. Schnoerr, G. Sanguinetti, and R. Grima (2014) Validity conditions for moment closure approximations in stochastic chemical kinetics. The Journal of Chemical Physics 141 (8), pp. 084103. Cited by: Comparison with related work.
  • [32] D. Schnoerr, G. Sanguinetti, and R. Grima (2017) Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. Journal of Physics A: Mathematical and Theoretical 50 (9), pp. 093001. Cited by: Introduction, Introduction, Comparison with related work.
  • [33] Schnoerr,David, Sanguinetti,Guido, and Grima,Ramon (2015) Comparison of different moment-closure approximations for stochastic chemical kinetics. The Journal of Chemical Physics 143 (18), pp. 185101. External Links: Document, Link Cited by: 1st item.
  • [34] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon (2002) Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics 31 (1), pp. 64–68. External Links: Document, ISBN 1546-1718, Link Cited by: Genetic feedback switch.
  • [35] A. Singh and J. P. Hespanha (2011) Approximate moment dynamics for chemically reacting systems. IEEE Transactions on Automatic Control 56 (2), pp. 414–418. External Links: Document, ISSN 0018-9286 Cited by: Introduction, Comparison with related work.
  • [36] P. Smadbeck and Y. N. Kaznessis (2013) A closure scheme for chemical master equations. Proceedings of the National Academy of Sciences 110 (35), pp. 14261–14265. External Links: Document, ISSN 0027-8424, Link Cited by: Introduction.
  • [37] V. Sotiropoulos and Y. N. Kaznessis (2011) Analytical derivation of moment equations in stochastic chemical kinetics. Chemical Engineering Science 66 (3), pp. 268 – 277. External Links: Document, ISSN 0009-2509, Link Cited by: Introduction.
  • [38] P. S. Swain, M. B. Elowitz, and E. D. Siggia (2002) Intrinsic and extrinsic contributions to stochasticity in gene expression. Proceedings of the National Academy of Sciences 99 (20), pp. 12795–12800. External Links: Document, ISSN 0027-8424 Cited by: Introduction.
  • [39] P. Thomas, H. Matuschek, and R. Grima (2012-10) Computation of biochemical pathway fluctuations beyond the linear noise approximation using iNA. In IEEE International Conference on Bioinformatics and Biomedicine, pp. 1–5. External Links: Document Cited by: Introduction.
  • [40] P. Thomas, N. Popović, and R. Grima (2014) Phenotypic switching in gene regulatory networks. Proceedings of the National Academy of Sciences 111 (19), pp. 6994–6999. External Links: Document, ISSN 0027-8424, Link Cited by: Introduction, Genetic toggle switch.
  • [41] T. Tian and K. Burrage (2006) Stochastic models for regulatory networks of the genetic toggle switch. Proceedings of the National Academy of Sciences 103 (22), pp. 8372–8377. External Links: Document, ISSN 0027-8424, Link Cited by: Genetic toggle switch.
  • [42] N. G. Van Kampen (2007) Stochastic Processes in Physics and Chemistry. Elsevier. Cited by: Introduction.
  • [43] M. Vellela and H. Qian (2009) Stochastic dynamics and non-equilibrium thermodynamics of a bistable chemical system: the schlögl model revisited. Journal of The Royal Society Interface 6 (39), pp. 925–940. External Links: Document, ISSN 1742-5689, Link Cited by: Schlögl model.
  • [44] Winkelmann,Stefanie and Schütte,Christof (2017) Hybrid models for chemical reaction networks: multiscale theory and application to gene regulatory systems. The Journal of Chemical Physics 147 (11), pp. 114115. External Links: Document, Link Cited by: Comparison with related work.
  • [45] Zheng,Qiang and Ross,John (1991) Comparison of deterministic and stochastic kinetics for nonlinear systems. The Journal of Chemical Physics 94 (5), pp. 3644–3648. External Links: Document, Link Cited by: Schlögl model.

Appendix S1 Proofs of statements

Theorem 1

Let and denote the solutions of the master equation in the original and expanded network, respectively. Then it holds that

for all .

Proof.

We prove the following equivalence for the derivatives of the solutions of the respective master equations

from which the statement holds under the assumption of consistent initial conditions.

Lemma 2

The expansion of a well-defined reaction network is well-defined.

Proof.

Let be a reaction of a well-defined network and its expansion according to Eq 5 in the main text. Let us take such that and separately consider the two cases for which this holds. If , then propensity function in the expanded reaction is equal to zero by definition, keeping with the requirement for the reaction being well-defined. If , then we have that because and are both members of . This implies that , and since, the reaction is well-defined we have that , from which . ∎

Lemma 3

Let be a reaction and its expansion according to Eq 5 in the main text. Then it holds that:

  1. ;

  2. ;

  3. , for all such that .

Proof.

For case (1):

For case (2):

For case (3):

(SE9)
(SE10)

where Eq. (SE10) follows from Eq. (SE9) because of the relations:

Theorem 4

Consider a well-defined reaction network and let be its expansion where

Let be the DRE solution of the expanded network and the solution of the master equation of the original network at time . Then it holds that:

  1. if then for all and ;

  2. if then , for all and .

Proof.

Case i). This statement holds if, whenever , then for all . The DRE for the expanded reaction network can be written as follows:

(SE11)

Since , every expanded reaction will be such that for each , hence in Eq. (SE11). Let us now assume toward a contradiction that for . This must hold only if both and for a reaction expanded from . For a given auxiliary species , the propensity function is in the form . Since for each this reduces to . As the reaction network is well-defined, implies that . In this case, from Eq 6 in the main text it follows that must be in the form , that is, for all , closing this case by contradiction.

Case ii). For each , the DRE can be written as: