Tropical Abstraction of Biochemical Reaction Networks with Guarantees

Biochemical molecules interact through modification and binding reactions, giving raise to a combinatorial number of possible biochemical species. The time-dependent evolution of concentrations of the species is commonly described by a system of coupled ordinary differential equations (ODEs). However, the analysis of such high-dimensional, non-linear system of equations is often computationally expensive and even prohibitive in practice. The major challenge towards reducing such models is providing the guarantees as to how the solution of the reduced model relates to that of the original model, while avoiding to solve the original model. In this paper, we have designed and tested an approximation method for ODE models of biochemical reaction systems, in which the guarantees are our major requirement. Borrowing from tropical analysis techniques, dominance relations among terms of each species' ODE are exploited to simplify the original model, by neglecting the dominated terms. As the dominant subsystems can change during the system's dynamics, depending on which species dominate the others, several possible modes exist. Thus, simpler models consisting of only the dominant subsystems can be assembled into hybrid, piecewise smooth models, which approximate the behavior of the initial system. By combining the detection of dominated terms with symbolic bounds propagation, we show how to approximate the original model by an assembly of simpler models, consisting in ordinary differential equations that provide time-dependent lower and upper bounds for the concentrations of the initial models species. Our method provides sound interval bounds for the concentrations of the chemical species, and hence can serve to evaluate the faithfulness of tropicalization-based reduction heuristics for ODE models of biochemical reduction systems. The method is tested on several case studies.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/24/2020

CLUE: Exact maximal reduction of kinetic models by constrained lumping of differential equations

Motivation: Detailed mechanistic models of biological processes can pose...
07/17/2020

Graphical Conditions for Rate Independence in Chemical Reaction Networks

Chemical Reaction Networks (CRNs) provide a useful abstraction of molecu...
01/04/2019

Quantum spectral methods for differential equations

Recently developed quantum algorithms address computational challenges i...
10/17/2019

Embedded discrepancy operators in reduced models of interacting species

In many applications of interacting systems, we are only interested in t...
01/11/2021

On-the-fly Reduced Order Modeling of Passive and Reactive Species via Time-Dependent Manifolds

One of the principal barriers in developing accurate and tractable predi...
05/17/2022

Chemical Case Studies in KeYmaera X

Safety-critical chemical processes are the backbone of multi-billion-dol...
06/01/2017

Efficient Low-Order Approximation of First-Passage Time Distributions

We consider the problem of computing first-passage time distributions fo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As biology becomes a data intensive science, due to advancements in high throughput molecular biology, the importance of in silico dynamical models of complex biological systems that are able to reproduce intricate behaviors observed in experimental settings increases. As such, modeling becomes a part of biological reasoning, but turns out to be a particularly challenging task. In particular, in models of biochemical networks, the number of possible chemical species is often subject to combinatorial explosion, due to the large number of species that may arise as a result of protein bindings and post-translational modifications [17]. As a consequence, mechanistic models of signaling pathways easily become very combinatorial. A common modeling approach describes the time-dependent evolution of concentrations of each of the modeled species through a system of coupled ordinary differential equations (ODEs). The combinatorial explosion of species and rich interaction scheme renders solving such a system of ODEs often prohibitory in practice, let alone the fact that is it already an approximation of its stochastic counter-part [20], as well as that the equations themselves do not transparently reflect the underlying mechanisms. Addressing the latter, formalisms allowing to write the mechanistic hypothesis in form of discrete transition steps have been proposed: Boolean networks[30], logical networks[32], Petri Nets[8], cellular automata[16], rule-based languages[10], to name the most common. Languages such as Kappa[10, 11] and BNGL[3] provide compact ways of describing models prone to combinatorial explosion, of simulating them [9], and even transforming them into ODEs [3]

. However, the curse of dimensionality once again rises when trying to compute the system behavior.

A strategy to cope with such complexity is model reduction, in which certain properties of biochemical models are exploited in order to obtain simpler versions of the original complex model; these simpler models should preserve the important behavioral aspects of the initial system. An example of such a property is the multiscaleness of biochemical networks, with respect to both time-scales and species’ abundance. In the case of the former, it is known that biochemical processes governing network dynamics span over many well separated timescales: while protein complex formation occurs on the seconds scale, post-translational protein modification takes minutes, and changing gene expression can take hours, or even days. As for the latter, multiscaleness also applies to the abundance of various species in biochemical networks: the DNA molecule has one to a few copies, while mRNA copy numbers can vary from a few to tens of thousands. On the one hand, these widely different time- and concentration scales represent challenges for the estimation of rate constants, for the measurement of low-concentration species, and even for the numerical integration. On the other hand, they represent a feature that can be exploited for model reduction purposes, allowing to approximate the complete mechanistic description with simpler rate expressions, retaining the essential features of the full problem on the time scale or in the concentration range of interest. The dynamics of multiscale, large biochemical systems can be reduced to those of simpler models, called

dominant subsystems [26], which contain less parameters and are easier to analyze. Dominant subsystems are chosen by comparing the time-scales of the large system. For example, the classical quasi steady-state (QSS) [5] and quasi-equilibrium (QE) approximations [23, 18] are conditions that lead to dominance, and represent popular methods for the computation of “first approximations” to the slow invariant manifold. Classical QSS is based on the small concentrations of highly reactive intermediate species (i.e., atoms, ions, enzymes and substrate-enzyme complexes)[4], while in the QE approximation the reduction of the full mechanism is done based on the existence of fast and slow reactions.

The multiscaleness property of biochemical network is by definition closely linked to the mathematical notion of dominance, captured in the framework of tropical analysis[1, 21]. Recently, a class of semi-formal methods for reducing and hybridizing models of biochemical networks has been developed, based on ideas from tropical analysis [26, 27, 28, 29]. These methods exploit the multiscaleness of biochemical networks, in order to deduce dominance relations among parameters and/or reaction rates, which can then be used to obtain a system of truncated ODEs (by eliminating the dominated terms). One of the advantages of using dominance relations in multi-scale networks is that it helps cope with parameter uncertainty: parameter values are replaced with their orders of magnitude, which are easier to determine. However, providing guarantees as to how the solution of the reduced model relates to the original one remains a challenge. Such is also the case in [2], where we proposed a reduction framework for rule-based models, based on time-scale separation. While this time-scale separation technique is justified by asymptotic convergence results, for any concrete parameter values, there is no information on the accuracy of the trajectories obtained by executing the reduced model.

In this paper, we design an approximation method for ODE models of biochemical networks, in which the guarantees are our major requirement. Our method combines abstraction and numerical approximation, and aims at providing better understanding/evaluation of tropical reduction methods. We abstract the solution of the original system of ODEs by a box, that over-approximates the state of the original system and provides lower and upper bounds for the value of each variable of the system in its current state. The simpler equations (which we call tropicalized) defining the hyperfaces of the box are obtained by combining the dominance concept borrowed from tropical analysis with symbolic bounds propagation. Mass invariants of the initial system of ODEs are used to refine the computed bounds, thus improving the accuracy of the method. The resulting (simplified) system provides a posteriori time-dependent lower and upper bounds for the concentrations of the initial model’s species, and thus bounds on numerical errors stemming from tropicalization. This means that no information on the original system’s trajectory is needed - the most important advantage of our approach. By contrast, the main difficulty of applying the classical QSS and QE reductions to biochemical models is that QE reactions and QSS species need to be specified a priori, which implies that some knowledge about the initial system’s behavior is necessary. This, in turn, means that significantly high-dimensional, non-linear systems cannot benefit from these reductions, as their analysis can be prohibitive in practice. An approach similar with respect to providing a posteriori time-dependent lower and upper bounds has been proposed in [7], where the differential semantics of rule-based models with non-contracting dynamics and unbounded sets of variables are treated. Rather than using dominance relations between ODE terms, a finite set of patterns is used in order to bound the number of occurrences of each pattern. Further related works, similar in the sense that they provide automatizible reduction methods with strong reduction guarantees are described in [13, 14]. However, both of these works are designed specifically for rule-based models, where they exploit the site-graph encoding of species’ structure, rather than the dominance regions.

Depending on the chosen granularity of mass-invariant-derived bounds, the method presented in this paper can either be used to reduce models of biochemical networks, or to quantify the approximation error of tropicalization reduction methods that do not involve guarantees. The guarantees of our method are obtained by formalizing the soundness relation between the original system of equations and the abstract system of ordinary differential equations operating on the coordinates of the hyper-faces of the box. The solution of a sound abstraction of an original system of differential equations, starting from a box that contains the initial state of the original system, defines a sound abstraction of the solution(s) of the original system. We apply our method to several case studies.

Outline The rest of this article is organized as follows. In Sect.2 we define the setting and concepts used in our approach, as well as introduce motivating examples. We then formally present and justify the method for deriving the system of reduced ODEs over the lower and upper bounds of species’ concentrations in Sect. 3. Also in Sect. 3, we present the two possible uses of our approach. We then discuss and conclude in Sect. 4.

2 Definitions and Motivating Example(s)

2.1 General Setting and Definitions

We define a dynamic reaction network over a set of species as a reaction system of the form:

(1)

where is the reaction number, and for each reaction , are the non-negative reaction rate constants of the forward, respectively backward reaction.

In this paper, we focus on the ordinary differential equations (ODE) semantics of models of biochemical networks. The underlying assumptions are that the various species of the chemical network are highly abundant, that stochastic fluctuations are negligible, and that the reaction system is well-mixed. In these conditions, the state of the dynamic system (1) can be represented as a multiset of the species’ concentrations , and its dynamics is described by a system of ODEs:

(2)

For each reaction , denotes the reaction (forward and backward) rate, and is a non-linear function of the concentrations. For example, the mass-action law reads:

(3)

which in turn means that is a multivariate polynomial of the species’ concentrations. In other words, the ODE of the -th species, , under mass action kinetics reads as a sum of monomials, which can be split into production and consumption terms, according to the sign that precedes their occurence in the equation:

(4)

where are Laurent polynomials with positive coefficients:

(5)

For convenience purposes, we will denote , where represent the production, respectively the consumption, monomials.

The reduction heuristics that use ideas from tropical analysis exploit the concept of dominance, which we borrow for our method. Let and be two (positive) monomials. We define -dominance as the following partial order relation on the set of multivariate monomials defined on subsets of :

(-dominance) For an , we say that dominates at a time point , denoted by , if .

In multiscale biochemical systems, the various monomials that compose the polynomials have different magnitude orders, such that at any given time there is only one or a few dominating monomials.

(Dominant monomial of a polynomial) For a given , the dominant monomial of a polynomial is defined as .

By using the max-plus algebra idea that the sum of positive, well separated terms, can be replaced by the maximum term, each of the two polynomials of (4) can be replaced by their dominant monomials. The result is a reduced model, consisting of a piecewise smooth function. As the dominant monomials of the can change from one concentration domain to another, the reduced model is a piecewise-smooth hybrid model.

(Two-term tropicalization of the smooth ODE system) We call two-term tropicalization of the smooth ODE system (4) the following piecewise-smooth system:

(6)

We note that a one-term tropicalization of the smooth ODE system, , is also possible, but choosing only one dominant momomial instead of the production-consumption pair of dominant monomials leads to a less precise model reduction (as more information is discarded in the one-term tropicalization). Thus, in this paper, we choose to deal with the two-term method.

2.2 Motivating example: Michaelis-Menten

In Sect.1, we mentioned that the classical QSS[5] and QE[18, 23] approximations represent popular methods for the simplification of biochemical networks.

As such, our motivating example is the Michaelis-Menten mechanism. This enzymatic model illustrates how these two simple methods that use the idea of dominance can be useful for model reduction of nonlinear models with multiple timescales. The Michaelis-Menten mechanism consists of an enzyme that reversibly binds a substrate to form a complex, which in turn releases a product, while preserving the enzyme:

(7)

The ODE system describing the evolution of the species’ concentration writes as:

(8)

The Michaelis-Menten equation relates the rate of product formation to the substrate concentration:

(9)

where is the total enzyme concentration, and is the Michaelis-Menten constant. Eq.(9)can be interpreted as the reaction rate of a reduced reactive system, equivalent to system (7), in which the intermediary complex has been eliminated:

(10)

The approximation of (8) by (9) is generally considered to be sufficiently good if the QSS assumption holds, that is if the total initial enzyme concentration is much lower than the total initial concentration of substrate: . In this case, the complex is a low concentration fast species, whose concentration is dominated by that of the substrate; the value of almost instantly relaxes to a value determined by . Thus, one can set , and exploit this relation to pool the two reactions of the initial system (7) into an unique irreversible reaction (10). The QSS condition can also be stated as [26].

The original MM analysis used the complementary QE approximation, which considers the complex formation reaction to be fast and reversible: . Thus, the term dominates the term in Eq.(8), meaning the latter can be discarded from the ODE system, allowing for pooling of species, and resulting once again in a single step approximation that reads:

(11)

with , if indeed . We note that if the QE assumption is indeed valid, .

One of the main difficulties of applying QSS and QE reductions to biochemical models is that the QE reactions and QSS species need to be specified a priori. Thus, simulation of the original model is sometimes 111In [27], the authors propose a formal method for the identification of QSS species and QE conditions, which follows from the calculation of the tropicalized system, and which does not require simulation of trajectories. needed in order to detect dominated species, which are either QSS species, or participate in QE reactions [26]. For high-dimensional non-linear systems, this requirement can represent an obstacle towards model reduction.

The issue regarding simulation of the initial system also arises when trying to quantify the efficiency of model reduction methods: ideally, the approximation errors resulting from the reduction should be computed without executing the original system.

Thus, in this paper we propose an approximation method for biochemical networks, in which no prior knowledge about the original system’s behavior is required. Our method combines the dominance concept with mass invariants of the original ODE system in order to compute inequality constraints on the species’ concentrations. These constraints are then combined with the original system of equations, in order to obtain a reduced system of ODEs that provides time-dependent lower and upper bounds on the species’ concentrations. Depending on the coarseness of detail we choose to incorporate in the mass invariant-generated inequalities, our approach can serve either as a reduction method, or to quantify the approximation errors of tropicalization reduction heuristics.

To achieve this, we abstract the original system by a box, the hyper-faces of which provide lower and upper bounds for the concentrations of the species. The two equations of the hyper-faces of a species represent simplified versions of the original differential equation of the species, in which only the dominant positive and negative monomials are considered. We refer to these equations as being tropicalized. Then, instead of interpreting the differential equations over the state of the original system, we will lift this interpretation conservatively over each hyper-face of the box. To do this, we will bound, for every hyper-face, the derivative of the corresponding coordinate in the solution of the original differential equation over the whole hyper-face. Our method should allow for formal evaluation of tropicalization approaches, and as such the bounds are derived using the dominance relations between monomials of the original ODE. Mass invariants of the original system will then be used to refine the bounds, and thus increase the accuracy of our method. By construction, the maximal solutions of the original, respectively tropicalized (i.e., abstracted) equations are related by the following soundness criterion: when both defined at time t, the state of the original system is within the hyper-box of the abstract system.

Let us consider the equations (2) of the Michaelis-Menten mechanism, under the QSS assumption: , i.e. , for an . From (2.1), it follows that one can write (by extension): . Then, we can deduce the following lower and upper bounds (that we call tropicalized) on the concentration of :

(12)

For convenience purposes, we will use the notation for the species’ concentrations, . We propose to approximate the state of the system by a box of . A box of is a set of the form , where are pairs of numbers satisfying , . Intuitively, the real number provides a lower bound to the value of the variable , and denotes the face . We will denote this face as . The other faces are defined in the same way, and the same reasoning applies to , which provides an upper bound to the same variable. For ease of notation, we shall use

to denote the vector

.

Next, let us consider the following functions:

(13)

The abstraction of the concrete system of equations is then defined as ,

.

If we fix the same initial conditions for both the concrete and the abstracted system, , we can relate the solution of the abstract system to that of the original one. For every , the real number provides a lower bound to the value of the function over the face , whereas the real number provides an upper bound to the value of the function over the face . That is to say, we have , for every pair , and , for every pair . Then, using the results of [19], we can conclude that, for every time point , and , the bounds:

(14)

are satisfied. Thus, the solution of the abstract system of equations provides lower and upper bounds for the value of the variables of the original system of equations.

Figure 1: Bounds on the species’ concentration with respect to simulation time, with , rate constants , and initial concentrations . For each of the 4 species, and denote the lower, respectively upper bounds on its concentration. The depicted results were obtained without using the mass invariants of the original system to constraint the bounds, and as such are suboptimal.

In the above example, in order to obtain safe lower/upper bounds on ’s concentration, we make the variables range over the hyper-faces. One notices that the variable is treated specifically in the derivatives of the variables - any of its occurences is replaced by the variable corresponding to the hyper-face we want to bound. By contrast, the other variables, , are replaced according to the sign of their occurence:

  • in , is replaced with

  • in , is replaced with

This comes from the fact that the derivative on is evaluated on the corresponding hyper-face, which allows for greatly reducing the loss of precision. For a formal proof of the soundness of this approach, the reader is referred to [19]. Intuitively, it is justified by the intermediate value theorem: given a family of functions over the real field, if one function does not take the highest value at time , whereas it is the case at time , then necessarily, there exists a time such that in which takes the highest value while crossing another function of the family.

In Example 2.2, the inequality constraints on the concentrations of species were determined based on reaction rates constants that verify the QSS condition. In Fig.1, we show the time-evolution of the bounds on the concentration of the 4 species in the Michaelis-Menten system, for an arbitrarily chosen set of reaction rate constants and initial concentrations that satisfy the QSS condition (i.e., , and at time ). Nonetheless, our model reduction is sound no matter the value of initial concentrations and reaction rate constants. The equations have been integrated using the solver of Matlab[22]. Strictly speaking, numerical errors stemming from numerical integration may accumulate throughout the simulation, but herein we choose to ignore them.

In Fig.1, we notice that the bounds diverge at a fast rate from the original trajectory, despite the restriction of the derivative’s evaluation on the hyper-face of the box (as explained in Note 1). A way to improve accuracy is to take into account the original system’s mass invariants, when computing the bounds. In general, a biochemical system can have several conservation laws/mass invariants, which are linear functions of the concentrations, that are constant in time. These equality constraints can be used to refine the bounds on the initial system’s species’ concentrations. We can safely further restrict the evaluation of the derivative of each coordinate to the intersection of the corresponding hyperface with the subspace delimited by the conservation laws containing the variable itself. Because a variable can appear in more than one mass invariant, we choose to keep the most optimistic bound that can be computed by intersecting the hyper-face with the mass invariant subspace: the greatest lower bound, respectively the smallest upper bound.

In the Michaelis-Menten system, the total number of enzymes is constant, and so is the overall number of substrates and product. The two conservation laws can be written as: , with , and .

Assuming once more that , by substituting by or into 12, three equivalent tropicalized upper bounds on the concentration of are obtained:

(15)

Lifting the interpretation of the differential equations over the hyper-face corresponding to results in three different expressions for the upper bound on , of possibly different accuracies:

(16)

The most accurate sound upper bound on then writes as:

(17)

The choice to introduce and operations in the expressions of the computed bounds is accounted for by our initial motivation: because existing tropicalization reduction heuristics are not justified by rigourous estimates, we aim to provide a method for quantifying errors stemming from such tropicalization reduction approaches, at the same time creating a tropicalization approach with guarantees222We nonetheless stress that our goal is not to correct the faults of existing tropicalization-inspired reduction methods, but rather quantify them by proposing a more rigorous tropicalization approach, in which the dominated monomials are bounded, rather than discarded from the ODEs. As such, we aim at computing error bounds that are as precise as possible, hence the choice of using and operations for bound refinement, albeit with the disadvantage of using functions that are not , thus introducing non-smooth vector fields. The trade-off between smoothness and precision can be tuned according to the desired goal: less precise bounds can be obtained by choosing to use smooth functions. Moreover, smoothness of vector fields is generally not guaranteed during the numerical simulation of biochemical models: as the model variables represent biochemical species’ concentrations, a good practice is to call the numerical solvers used to approximate the system’s behavior using with the ’Non-Negative’ option, which amounts to introducing a operation into the equations (i.e., ), in order to prevent negative values of variables. The same reasoning can be applied to all variables appearing in the expression of , in order to obtain the most accurate upper bound:

(18)
Figure 2: Left: bounds on the concentration of P, obtained by simulating the ODE system in Example2.2, for different values of . Rate constants and initial concentration as in Fig.1: . Right: For different values of , the accuracy of the resulting bounds is computed as the difference between the upper and the lower bound.

In (15), when computing the third bound, instead of substituting by its conservation law expression, , we choose to bound its value by an expression not containing . We do so in order to avoid introducing supplementary variables w.r.t. those present in the tropicalized original bound (i.e., , and ). This method, in which mass invariant partial refinement is introduced after the tropicalized bounds have been computed, can be considered as a per se model reduction method, as no supplementary information/complexity is introduced by incorporating the conservation laws. By contrast, the approach in which all the information contained by the conservation laws is exploited in order to derive the most accurate bounds can constitute a method of error-estimation for tropicalization based reduction heuristics. We present the two different methods formally in Section 3.

The issue of specifying QSS species and QE reactions a priori, when performing model reductions, is circumvented by our method. Instead, the notion of region is used in order to eliminate monomials from the species’ ODEs. Our method uses static inspection of each ODE, in order to partition the state space into different regions according to which production, respectively consumption terms dominate the others. Using this partitioning, simplified expressions bounding the species concentrations are derived for each region of the state space, allowing symbolic simplification and limiting numerical approximations.

In the case of the Michaelis-Menten mechanism, there are three possible dominance regions for leading to three possible pairs of lower and upper bounds:

  1. Region 1, if dominates :

  2. Region 2, if dominates :

  3. Region 3, if there is no dominant rate (i.e., and are of comparable magnitude):

The complete system of equations obtained using mass invariants refinement of bounds, for all the possible dominance regions, can be found in Example 2.2. The improvement of bound accuracy via mass invariants can be observed in Fig. 2. As expected, one can also observe in Fig.2 that results become more precise as the value of increases, i.e. as and become more separated.

For convenience purposes, denote the species concentrations, , using . Then, the derivatives of the lower and upper bounds of the original system’s species’ concentrations write as:

  • , with

  • , with

  • , with

  • ,with

The Michaelis-Menten system represents a particular, simple case study: the choice of reaction rate constants fixes the dominance region in which the system evolves. In general, the state of a biochemical network can traverse multiple such regions, as the dominant monomials can change from one concentration domain to another. Thus, we next introduce a case study in which the dominant monomials are concentration-dependent, which in turn means that the dominance region is no longer fixed. Our method is designed with this more general situation in mind: having computed the most accurate bounds for each region of the state space partitioning, and having no information regarding the region in which the original system evolves at a given time , our approach chooses the least accurate local bound, in order to ensure global soundness.

2.3 Motivating example: A DNA model

We construct a simple extension of the Michaelis-Menten system, in which the product formation reaction is catalazyed by a dimer of an enzyme . The reaction system and its ODE system333once again, we denote the species by read:

(19)

The mass invariants write:

(20)

Dominance regions become concentration dependent: for example, the dominant positive monomial in is determined by the dominance relations between both the concentrations of and , and between reaction rate constants and .

This DNA example will serve as a case study for the remainder of our paper.

3 Combining ODEs and mass invariants

3.1 Model reduction using conservative numerical approximations

The guarantees of our method are a consequence of a carefully designed symbolic propagation of inequality constraints on the species’ concentrations. Thus, symbolic transformations have to be applied on numerical expressions, of which we introduce a syntax and semantics. We also introduce an alternative definition of a biochemical model to that presented in Sect. 2, which is then used to define and justify our approximation method.

(Syntax of expressions) Let be a set of variables. We define an -expression inductively, as follows444the syntactic operators are written using a superscript dot, in order to distinguish them from their associated mathematical functions:

  1. each positive real number is an -expression;

  2. each variable is an -expression;

  3. if is an -expression, then is an -expression;

  4. if and are -expressions, then , , , are all -expressions;

The set of -expressions is denoted as . Given an -expression , we define its support, denoted , as the set of variables it contains.

(Semantics of expressions) Let be a set of variables and be an -expression. The semantics of the expression is the function , defined inductively as follows: