1. Introduction
This paper is concerned with a mathematical model for a gene regulatory network involved in the regulation of DNA transcription. DNA transcription is part of the mechanism by which a sequence of the nuclear DNA is translated into the corresponding protein. The transcription is initiated by the binding of a transcription factor, which is usually another protein, onto the gene’s DNA-binding domain. Once bound, the transcription factor promotes the transcription of the nuclear DNA into a messenger RNA (further denoted by mRNA), which, once released, is converted into the corresponding protein by the ribosomes. This process is subject to a high level of noise due to the large variability of the conditions that prevail in the cell and the nucleus at the moment of the transcription. Yet, a rather stable amount of the final protein is needed for the good operation of the cell. The processes that regulate noise levels and maintain cell homeostasis have been scrutinized for a long time. Recently, micro RNA’s (further referred to as
RNA’s) have occupied the front of the scene. These are very short RNA’s which do not code for proteins. Many different sorts of RNA’s are involved in various epigenetic processes. But one of their roles seems precisely the reduction of noise level in DNA transcription. In this scenario, the RNA’s are synthesized together with the mRNA’s. Then, some of the synthesized RNA’s bind to the mRNA’s and de-activate them. These RNA-bound mRNA become unavailable for protein synthesis. It has been proposed that this paradoxical mechanism which seems to reduce the efficiency of DNA transcription may indeed have a role in noise regulation (see [9, 17, 10] and the review [19]). The goal of the present contribution is to propose a mathematical model of the RNA-mRNA interaction and to use it to investigate the role of RNA’s as potential noise regulators.Specifically, in this paper, we propose a stochastic chemical kinetic model for the mRNA and RNA content in a cell. The production of mRNA’s by the transcription factor and their inactivation through RNA binding are taken into account. More precisely, our model is a simplified version of the circuit used in [25, Fig. 2A and 2A’]. We consider a ligand involved in the production of both an mRNA and a RNA, the RNA having the possibility to bind to the mRNA and deactivate it. By contrast to [25], we disregard the way the ligand is produced and consider that the ligand is such that there is a constant production rate of both mRNA and RNA. A second difference to [25] is that we disregard the transcription step of the mRNA into proteins. While [25] proposes to model the RNA as acting on the transcription rate of the mRNA into proteins, we assume that the RNA directly influences the number of mRNA available for transcription. Therefore, we directly relate the gene expression level to the number of
RNA-free mRNA also referred to as the number of unbound mRNA. In order to model the stochastic variability in the production of the RNAs, a multiplicative noise is added to the production rate at all time. From the resulting system of stochastic differential equations, we introduce the joint probability density for mRNA and µRNA which solves a deterministic Fokker-Planck equation. The mathematical object of interest is the stationary density solving the Fokker-Planck equation and more precisely the marginal density of the mRNA. The coefficient of variation (also called cell-to-cell variation) of this mRNA density, which is its standard deviation divided by its the expectation, is often considered as the relevant criteria for measuring the robustness of gene expression (see for instance
[25]).Our main goal in this contribution is to provide theoretical and numerical evidences that the robustness of the gene expression is increased in the presence of RNA. At the theoretical level we derive a number of analytical formulas either for particular subsets of parameters of the model or under some time-scale separation hypotheses. From these formulas we can easily compute the cell to cell variation numerically and verify the increased robustness of gene expression when binding with microRNA happens in the model. For general sets of parameters, the solution cannot be computed analytically. However we can prove well-posedness of the model and solve the PDE with a specifically designed numerical scheme. From the approximate solution, we compute the coefficient of variation and verify the hypothesis of increased gene expression.
Another classical approach to the study of noise in gene regulatory networks is through the chemical master equation [27] which is solved numerically by means of Gillespie’s algorithm [18], see e.g. [12, 25]. Here, we use a stochastic chemical kinetic model through its associated Kolmogorov-Fokker-Planck equation. Chemical kinetics is a good approximation of the chemical master equation when the number of copies of each molecule is large. This is not the case in a cell where sometimes as few as a 100 copies of some molecules are available. Specifically, including a stochastic term in the chemical kinetic approach is a way to retain some of the randomness of the process while keeping the model complexity tractable. This ultimately leads to a Fokker-Planck model for the joint distribution of mRNAs and RNAs. In [16]
, a similar chemical kinetic model is introduced with a different modelling of stochasticity. The effect of the noise is taken into account by adding some uncertainty in the (steady) source term and the initial data. The authors are interested in looking at how this uncertainty propagates to the mRNA content and in comparing this uncertainty between situations including µRNA production or not. The uncertainty is modeled by random variables with given probability density functions. Compared to
[16], the Fokker-Planck approach has the advantage that the random perturbations do not only affect the initial condition and the source term, but are present at all times and varies through time. We believe that this is coherent with how stochasticity in a cell arises through time-varying ecological or biological factors.While Fokker-Planck equations are widely used models in mathematical biology [26], their use for the study of gene regulatory network is, up to our knowledge, scarce (see e.g. [22]). Compared to other approaches, the Fokker-Planck model enables us to derive analytical formulas for solutions in certain cases. This is particularly handy for understanding the role of each parameter in the model, calibrating them from real-world data and perform fast numerical computations. Nevertheless, in the general case, the theoretical study and the numerical simulation of the model remains challenging because of the unboundedness of the drift and diffusion coefficients. We believe that we give below all the tools for handling these difficulties, and that our simple model provide a convincing mathematical interpretation of the increase of gene expression in the presence of RNAs.
The paper is organized as follows. In Section 2, we introduce the system of SDEs and the corresponding Fokker-Planck models. In Section 3, we discuss the well-posedness of the Fokker-Planck equations and derive analytical formulas for solutions under some simplifying hypotheses. In Section 4, we use the analytical formulas for solutions to give mathematical and numerical proofs of the decrease of cell-to-cell variation in the presence of RNA. In Section 5, we propose a numerical scheme for solving the main Fokker-Planck model and gather further evidences confirming the hypothesis of increased gene expression from the simulations. Finally, in Section 6 we discuss the particular choice of multiplicative noise (i.e.
the diffusion coefficient in the Fokker-Planck equation) in our model. In the appendix, we derive weighted Poincaré inequalities for gamma and inverse-gamma distributions which are useful in the analysis of Section
3.2. Presentation of the models
In this section, we introduce three steady Fokker-Planck models whose solutions describe the distribution of unbound mRNA and RNA within a cell. The solutions to these equations can be interpreted as the probability density functions associated with the steady states of stochastic chemical kinetic systems describing the production and destruction of mRNA and RNA. In Section 2.1 we introduce the main model for which the consumption of RNAs is either due to external factors in the cell (transcription, etc.) or to binding between the two types of mRNA and RNA. Then, for comparison, in Section 2.2 we introduce the same model without binding between RNAs. Finally in Section 2.3, we derive an approximate version of the first model, by considering that reactions involving RNAs are infinitely faster than those involving mRNAs, which amplifies the binding phenomenon and mathematically allows for the derivation of analytical formulas for solutions. The latter will be made explicit in Section 3.
2.1. Dynamics of mRNA and RNA with binding
We denote by the number of unbound mRNA and the number of unbound microRNA of a given cell at time t. The kinetics of unbound mRNA and RNA is then given by the following stochastic differential equations
(1) |
with , , , , , being some given positive constants and being a given non-negative constant. Let us detail the meaning of each term in the modeling. The first term of each equation models the constant production of mRNA (resp. RNA) by the ligand at a rate (resp. ). The second term models the binding of the RNA to the mRNA. Unbound mRNA and RNA are consumed by this process at the same rate. The rate increases with both the number of mRNA and RNA. In the third term, the parameters and are the rates of consumption of the unbound mRNA or
RNA by various decay mechanisms. The last term in both equations represents stochastic fluctuations in the production and destruction mechanisms of each species. It relies on a white noise
where is a bi-dimensional standard Brownian motion. The intensity of the stochastic noise is quantified by the parameters and . Such a choice of multiplicative noise ensures that and remain non-negative along the dynamics.In this paper we are interested in the invariant measure of (1) rather than the time dynamics described by the above SDEs. From the modelling point of view, we are considering a large number of identical cells and we assume that mRNA and RNA numbers evolve according to (1). Then we measure the distribution of both RNAs among the population, when it has reached a steady state . According to Itô’s formula, the steady state should satisfy the following steady Fokker-Planck equation
(2) |
where the Fokker-Planck operator is given by
(3) |
Since we do not model the protein production stage, we assume that the observed distribution of gene expression level is proportional to the marginal distribution of mRNA, i.e.
By integration of (2) in the variable, satisfies the equation
(4) |
The quantity is the conditional expectation of the number of RNA within the population in the presence of molecules of mRNA and it is given by
(5) |
2.2. Dynamics of free mRNA without binding
In the case where there is no RNA binding, namely when , the variables and are independent. Thus, the densitites of the invariant measures satisfying (2) are of the form
where is the density of the marginal distrubution of RNA. From the modelling point of view, it corresponds to the case where there is no feed-forward loop from RNA. Therefore, only the dynamics on mRNA, and thus , is of interest in our study. It satisfies the following steady Fokker-Planck equation obtained directly from (4),
(6) |
It can be solved explicitly as we will discuss in Section 3.2.
2.3. Dynamics with binding and fast Rna
The Fokker-Planck equation (2) cannot be solved explicitly. However, one can make some additional assumptions in order to get an explicit invariant measure providing some insight into the influence of the binding mechanism with RNA. This is the purpose of the model considered hereafter.
Let us assume the RNA-mRNA binding rate, the RNA decay and the noise on RNA are large. Since the sink term of the RNA equation is large, it is also natural to assume that the RNA content is small. Mathematically, we assume the following scaling
for some small constant . Then satisfies
whose corresponding steady Fokker-Planck equation for the invariant measure then writes, dropping the tilde,
In the limit case where , one may expect that at least formally, the density converges to a limit density satisfying
As is only a parameter in the previous equation and since the first marginal of still satisfies (4) for all , one should have (formally)
(7) |
3. Well-posedness of the models and analytical formulas for solutions
In this section, we show that the three previous models are well-posed. For the Fokker-Planck equations (6) and (7), we explicitely compute the solutions. They involve inverse gamma distributions.
3.1. Gamma and inverse gamma distributions
The expressions of the gamma and inverse gamma probability densities are respectively
(8) |
and
(9) |
for . The normalization constant is given by where is the Gamma function. Observe that by the change of variable one has
which justifies the terminology. Let us also recall that the first and second moments of the inverse gamma distribution are
(10) |
(11) |
Interestingly enough, we can show (see Appendix A for details and additional results) that inverse gamma distributions with finite first moment () satisfy a (weighted) Poincaré inequality. The proof of the following proposition is done in Appendix A among more general considerations.
Proposition 3.1.
Let and . Then, for any function such that the integrals make sense, one has
(12) |
where for any probability density and any function on , the notation denotes .
3.2. Explicit mRNA distribution without binding
In the case of free mRNAs, a solution to (6) can be computed explicitely and takes the form of an inverse gamma distribution.
Lemma 3.2.
Proof.
First observe that
Therefore a solution of (8) must be of the form
for some constants . The first term decays like at infinity, thus the only probability density of this form is obtained for and .
∎
The Poincaré inequality (12) tells us that the solution of Proposition 3.2 is also the only (variational) solution in the appropriate weighted Sobolev space. Indeed, we may introduce the natural Hilbert space associated with Equation (6),
with a squared norm given by
Then the following uniqueness result holds.
Lemma 3.3.
The classical solution is the only solution of (6) in .
Proof.
Another consequence of the Poincaré inequality is that if we consider the time evolution associated with the equation (6) then solutions converge exponentially fast towards the steady state . This justifies our focus on the stationary equations. The transient regime is very short and equilibrium is reached quickly. We can quantify the rate of convergence in terms of the parameters.
Proposition 3.4.
Let solve the Fokker-Planck equation
starting from the probability density . Then for all ,
Proof.
Observe that solves the unsteady Fokker-Planck equation, so that by multiplying the equation by and integrating in one gets
Then by using the Poincaré inequality (12) and a Gronwall type argument, one gets the result. ∎
3.3. Explicit mRNA distribution in the presence of fast Rna
Now we focus on the resolution of (7). The same arguments than those establishing Lemma 3.3 show that the only function satisfying (7) is the following inverse gamma distribution
(14) |
Then an application of (10) yields
(15) |
It remains to find which is a probability density solving the Fokker-Planck equation
Arguing as in the proof of Lemma 3.2, one observe that integrability properties force to actually solve
which yields
(16) |
where is a normalizing constant making a probability density function.
3.4. Well-posedness of the main Fokker-Planck model
Now we are interested in the well-posedness of (2), for which we cannot derive explicit formulas anymore. Despite the convenient functional framework introduced in Section 3.2
, classical arguments from elliptic partial differential equation theory do not seem to be adaptable to the case
. The main obstruction comes from an incompatibility between the natural decay of functions in the space and the rapid growth of the term when .However, thanks to probabilistic methods detailed in [21] and focused specifically on Fokker-Planck equations, we are able to prove well-posedness of the steady Fokker-Planck equation (2). The method is based on finding a Lyapunov function for the adjoint of the Fokker-Planck operator and relies on an integral identity proved by the same authors in [20].
First of all let us specify the notion of solution. A weak solution to (2) is an integrable function such that
(17) |
where the adjoint operator is given by
(18) |
A reformulation and combination of [21, Theorem A and Proposition 2.1] provides the following result.
Proposition 3.5 ([21]).
Assume that there is a smooth function , called Lyapunov function with respect to , such that
(19) |
and
(20) |
where . Then there is a unique satisfying (17). Moreover .
Lemma 3.6.
Proof.
Proposition 3.7.
There is a unique weak solution to the steady Fokker-Planck equation (2). Moreover, .
4. Noise reduction by binding : the case of fast Rna
In this section we focus on the comparison between the explicit distributions (13) and (16). We are providing theoretical and numerical evidences that the coefficient of variation (which is a normalized standard deviation) of (16) is less than that of (13). This quantity called cell to cell variation in the biological literature [25] characterizes the robustness of the gene expression level (the lower the better). We start by performing a rescaling in order to extract the dimensionless parameters which characterize the distributions.
4.1. Dimensional analysis
In order to identify the parameters of importance in the models, we rescale the variable around a characteristic value chosen to be
(21) |
This choice is natural in the sense that it corresponds to the steady state of the mRNA dynamics without binding nor stochastic effects, that is . It is also the expectation of . By rescaling into both and are rescaled into dimensionless densities
(22) | ||||
(23) |
where and are normalizing constants depending on the parameters of the model and , and are dimensionless parameters. The first parameter
(24) |
only depends on constants that are independent of the dynamics of RNAs. The two other dimensionless parameters are
(25) |
and
(26) |
Let us give some insight into the biological meaning of these parameters. The parameter measures the relative importance of the two mechanisms of destruction of RNAs, namely the binding with mRNAs versus the natural destruction/consumption. A large means that the binding effect is strong and conversely. The parameter compares the production rate of RNAs with that of mRNAs. Large values of mean that there are much more RNAs than mRNAs produced per unit of time.
4.2. Cell to cell variation (CV)
For any integrable non-negative function , let us denote by
its -th moment. The coefficient of variation or cell to cell variation (CV) is defined by
(27) |
where and
denote the expectation and variance. One has the following result.
Proposition 4.1.
Consider the dimensionless distributions defined in (22) and (23). Then one has that
(28) |
where the variance and coefficient of variation are well-defined only for . Moreover, for all , one has
and
Thus for all and one has
Finally one has the bound
(29) |
which holds for all , and . Observe that but asymptotically
Proof.
The formulas for the moments follow from (10) and (11) and the limits can be taken using dominated convergence. The bound (29) is a consequence of the Prékopa-Leindler inequality (see [13] and references therein) which states that if are three functions satisfying for some and for all ,
(30) |
then
(31) |
We use it with , if and if , and . The condition (30) is then equivalent to
which is satisfied as the term between brackets is always greater than and the function , is bounded from below by , where is given in (29). Then with the change of variable in the integrals of (31), one recovers (29). ∎
4.3. Exploration of the parameter space
Now, we explore the space of parameters in order to compare the cell to cell variation in the case of fast RNA and in the case of free mRNA.
In order to evaluate numerically the cell to cell variation we need to compute , for . Observe that after a change of variable these quantities can be rewritten (up to an explicit multiplicative constant depending on parameters)
with . For the numerical computation of these integrals, we use a Gauss-Laguerre quadrature
which is natural and efficient as we are dealing with functions integrated against a gamma distribution. We refer to [24] and references therein for the definition of the coefficients and quadrature points . The truncation order is chosen such that the numerical error between the approximation at order and is inferior to the given precision when . For , the function may take large values and it is harder to get the same numerical precision. In the numerical results below the mean error for the chosen sets of parameters with large values of is around and the maximal error is . This is good enough to comment on qualitative behavior.
We plot the relative cell to cell variation with respect to and for two different values of . The results are displayed on Figure 1. Then, on Figure 2, we draw the explicit distributions for various sets of parameters and compare it with .
![]() |
![]() |
The numerical simulations of Figure 1 suggest that the bound (29) is non-optimal. Actually, we conjecture that for all , and one has
(32) |
Observe that from Proposition 4.1 we know that for the inequality becomes an equality when and . For it saturates when and we conjecture that the coefficient of variation tends to when tends to infinity.
From a modeling point of view, these simulations confirm that for any choice of parameter, the presence of (fast) RNA makes the cell to cell variation decrease compared to the case without RNA. Moreover, the qualitative behavior with respect to the parameters makes sense. Indeed we observe that whenever enough RNA is produced (), the increase of the binding phenomenon () makes the cell to cell variation decay drastically.
![]() |
![]() |
5. Noise reduction by binding for the main Fokker-Planck model: numerical evidences
In this section, we compute the gene expression level of the main model described by equation (2). In this case, as there is no explicit formula for the solution, we will compute an approximation of it using a discretization of the Fokker-Planck equation. In order to compute the solution in practice, we restrict the domain to the bounded domain . Because of the truncation, we add zero-flux boundary conditions in order to keep a conservative equation. It leads to the problem
(33) |
5.1. Dimensional analysis and reformulation of the equation
In order for the numerical scheme to be more robust with respect to the size of the parameters, we start by rewriting the equation in a dimensionless version. It will also allow for comparisons with numerical experiments of the previous sections.
We introduce
where the characteristic numbers of mRNA and RNA are, as before respectively defined by
After some computations one obtains that equation (33) can be rewritten
(34) |
with the corresponding no-flux boundary conditions and normalization. The parameters , and are those of the previous sections and the two new parameters are
(35) |
and
(36) |
The parameter compares consumption of RNA versus that of mRNA by mechanisms which are not the binding between the two RNAs. The parameter compares the amplitude of the noise in the dynamics of RNA versus that of the mRNA.
Remark 5.1.
As the coefficients in the advection and diffusion parts of (34) grow rapidly in , and degenerate when and , an efficient numerical resolution of (34) is not straightforward. Moreover a desirable feature of the scheme would be a preservation of the analytically known solution corresponding to . Because of these considerations we will discretize a reformulated version of the equation in which the underlying inverse gamma distributions explicitly appear. It will allow for a better numerical approximation when and are either close to or large. The reformulation is the following
(37) |
with the associated no-flux boundary conditions and where the functions and are given by
(38) |
and
(39) |
5.2. Presentation of the numerical scheme
We use a discretization based this reformulation (37). It is inspired by [8] and is fairly close to the so-called Chang-Cooper scheme [15].
We use a finite-volume scheme. The rectangle is discretized with a structured regular mesh of size and in each respective direction. The centers of the control volumes are the points with and for and . We also introduce the intermediate points with and with defined w