DeepAI
Log In Sign Up

A Fokker-Planck approach to the study of robustness in gene expression

We study several Fokker-Planck equations arising from a stochastic chemical kinetic system modeling a gene regulatory network in biology. The densities solving the Fokker-Planck equations describe the joint distribution of the messenger RNA and micro RNA content in a cell. We provide theoretical and numerical evidences that the robustness of the gene expression is increased in the presence of micro RNA. At the mathematical level, increased robustness shows in a smaller coefficient of variation of the marginal density of the messenger RNA in the presence of micro RNA. These results follow from explicit formulas for solutions. Moreover, thanks to dimensional analyses and numerical simulations we provide qualitative insight into the role of each parameter in the model. As the increase of gene expression level comes from the underlying stochasticity in the models, we eventually discuss the choice of noise in our models and its influence on our results.

READ FULL TEXT VIEW PDF
10/16/2019

An Uncertainty Quantification Approach to the Study of Gene Expression Robustness

We study a chemical kinetic system with uncertainty modeling a gene regu...
06/29/2022

Extracting Information from Stochastic Trajectories of Gene Expression

Gene expression is a stochastic process in which cells produce biomolecu...
03/02/2016

Evolving Boolean Regulatory Networks with Variable Gene Expression Times

The time taken for gene expression varies not least because proteins var...
01/06/2021

Classification of chemical compounds based on the correlation between in vitro gene expression profiles

Toxicity evaluation of chemical compounds has traditionally relied on an...
07/18/2018

Detecting strong signals in gene perturbation experiments: An adaptive approach with power guarantee and FDR control

The perturbation of a transcription factor should affect the expression ...

1. Introduction

This paper is concerned with a mathematical model for a gene regulatory network involved in the regulation of DNA transcription. DNA transcription is part of the mechanism by which a sequence of the nuclear DNA is translated into the corresponding protein. The transcription is initiated by the binding of a transcription factor, which is usually another protein, onto the gene’s DNA-binding domain. Once bound, the transcription factor promotes the transcription of the nuclear DNA into a messenger RNA (further denoted by mRNA), which, once released, is converted into the corresponding protein by the ribosomes. This process is subject to a high level of noise due to the large variability of the conditions that prevail in the cell and the nucleus at the moment of the transcription. Yet, a rather stable amount of the final protein is needed for the good operation of the cell. The processes that regulate noise levels and maintain cell homeostasis have been scrutinized for a long time. Recently, micro RNA’s (further referred to as

RNA’s) have occupied the front of the scene. These are very short RNA’s which do not code for proteins. Many different sorts of RNA’s are involved in various epigenetic processes. But one of their roles seems precisely the reduction of noise level in DNA transcription. In this scenario, the RNA’s are synthesized together with the mRNA’s. Then, some of the synthesized RNA’s bind to the mRNA’s and de-activate them. These RNA-bound mRNA become unavailable for protein synthesis. It has been proposed that this paradoxical mechanism which seems to reduce the efficiency of DNA transcription may indeed have a role in noise regulation (see [9, 17, 10] and the review [19]). The goal of the present contribution is to propose a mathematical model of the RNA-mRNA interaction and to use it to investigate the role of RNA’s as potential noise regulators.

Specifically, in this paper, we propose a stochastic chemical kinetic model for the mRNA and RNA content in a cell. The production of mRNA’s by the transcription factor and their inactivation through RNA binding are taken into account. More precisely, our model is a simplified version of the circuit used in [25, Fig. 2A and 2A’]. We consider a ligand involved in the production of both an mRNA and a RNA, the RNA having the possibility to bind to the mRNA and deactivate it. By contrast to [25], we disregard the way the ligand is produced and consider that the ligand is such that there is a constant production rate of both mRNA and RNA. A second difference to [25] is that we disregard the transcription step of the mRNA into proteins. While [25] proposes to model the RNA as acting on the transcription rate of the mRNA into proteins, we assume that the RNA directly influences the number of mRNA available for transcription. Therefore, we directly relate the gene expression level to the number of

RNA-free mRNA also referred to as the number of unbound mRNA. In order to model the stochastic variability in the production of the RNAs, a multiplicative noise is added to the production rate at all time. From the resulting system of stochastic differential equations, we introduce the joint probability density for mRNA and µRNA which solves a deterministic Fokker-Planck equation. The mathematical object of interest is the stationary density solving the Fokker-Planck equation and more precisely the marginal density of the mRNA. The coefficient of variation (also called cell-to-cell variation) of this mRNA density, which is its standard deviation divided by its the expectation, is often considered as the relevant criteria for measuring the robustness of gene expression (see for instance

[25]).

Our main goal in this contribution is to provide theoretical and numerical evidences that the robustness of the gene expression is increased in the presence of RNA. At the theoretical level we derive a number of analytical formulas either for particular subsets of parameters of the model or under some time-scale separation hypotheses. From these formulas we can easily compute the cell to cell variation numerically and verify the increased robustness of gene expression when binding with microRNA happens in the model. For general sets of parameters, the solution cannot be computed analytically. However we can prove well-posedness of the model and solve the PDE with a specifically designed numerical scheme. From the approximate solution, we compute the coefficient of variation and verify the hypothesis of increased gene expression.

Another classical approach to the study of noise in gene regulatory networks is through the chemical master equation [27] which is solved numerically by means of Gillespie’s algorithm [18], see e.g. [12, 25]. Here, we use a stochastic chemical kinetic model through its associated Kolmogorov-Fokker-Planck equation. Chemical kinetics is a good approximation of the chemical master equation when the number of copies of each molecule is large. This is not the case in a cell where sometimes as few as a 100 copies of some molecules are available. Specifically, including a stochastic term in the chemical kinetic approach is a way to retain some of the randomness of the process while keeping the model complexity tractable. This ultimately leads to a Fokker-Planck model for the joint distribution of mRNAs and RNAs. In [16]

, a similar chemical kinetic model is introduced with a different modelling of stochasticity. The effect of the noise is taken into account by adding some uncertainty in the (steady) source term and the initial data. The authors are interested in looking at how this uncertainty propagates to the mRNA content and in comparing this uncertainty between situations including µRNA production or not. The uncertainty is modeled by random variables with given probability density functions. Compared to

[16], the Fokker-Planck approach has the advantage that the random perturbations do not only affect the initial condition and the source term, but are present at all times and varies through time. We believe that this is coherent with how stochasticity in a cell arises through time-varying ecological or biological factors.

While Fokker-Planck equations are widely used models in mathematical biology [26], their use for the study of gene regulatory network is, up to our knowledge, scarce (see e.g. [22]). Compared to other approaches, the Fokker-Planck model enables us to derive analytical formulas for solutions in certain cases. This is particularly handy for understanding the role of each parameter in the model, calibrating them from real-world data and perform fast numerical computations. Nevertheless, in the general case, the theoretical study and the numerical simulation of the model remains challenging because of the unboundedness of the drift and diffusion coefficients. We believe that we give below all the tools for handling these difficulties, and that our simple model provide a convincing mathematical interpretation of the increase of gene expression in the presence of RNAs.

The paper is organized as follows. In Section 2, we introduce the system of SDEs and the corresponding Fokker-Planck models. In Section 3, we discuss the well-posedness of the Fokker-Planck equations and derive analytical formulas for solutions under some simplifying hypotheses. In Section 4, we use the analytical formulas for solutions to give mathematical and numerical proofs of the decrease of cell-to-cell variation in the presence of RNA. In Section 5, we propose a numerical scheme for solving the main Fokker-Planck model and gather further evidences confirming the hypothesis of increased gene expression from the simulations. Finally, in Section 6 we discuss the particular choice of multiplicative noise (i.e.

the diffusion coefficient in the Fokker-Planck equation) in our model. In the appendix, we derive weighted Poincaré inequalities for gamma and inverse-gamma distributions which are useful in the analysis of Section 

3.

2. Presentation of the models

In this section, we introduce three steady Fokker-Planck models whose solutions describe the distribution of unbound mRNA and RNA within a cell. The solutions to these equations can be interpreted as the probability density functions associated with the steady states of stochastic chemical kinetic systems describing the production and destruction of mRNA and RNA. In Section 2.1 we introduce the main model for which the consumption of RNAs is either due to external factors in the cell (transcription, etc.) or to binding between the two types of mRNA and RNA. Then, for comparison, in Section 2.2 we introduce the same model without binding between RNAs. Finally in Section 2.3, we derive an approximate version of the first model, by considering that reactions involving RNAs are infinitely faster than those involving mRNAs, which amplifies the binding phenomenon and mathematically allows for the derivation of analytical formulas for solutions. The latter will be made explicit in Section 3.

2.1. Dynamics of mRNA and RNA with binding

We denote by the number of unbound mRNA and the number of unbound microRNA of a given cell at time t. The kinetics of unbound mRNA and RNA is then given by the following stochastic differential equations

(1)

with , , , , , being some given positive constants and being a given non-negative constant. Let us detail the meaning of each term in the modeling. The first term of each equation models the constant production of mRNA (resp. RNA) by the ligand at a rate (resp. ). The second term models the binding of the RNA to the mRNA. Unbound mRNA and RNA are consumed by this process at the same rate. The rate increases with both the number of mRNA and RNA. In the third term, the parameters and are the rates of consumption of the unbound mRNA or

RNA by various decay mechanisms. The last term in both equations represents stochastic fluctuations in the production and destruction mechanisms of each species. It relies on a white noise

where is a bi-dimensional standard Brownian motion. The intensity of the stochastic noise is quantified by the parameters and . Such a choice of multiplicative noise ensures that and remain non-negative along the dynamics.

In this paper we are interested in the invariant measure of (1) rather than the time dynamics described by the above SDEs. From the modelling point of view, we are considering a large number of identical cells and we assume that mRNA and RNA numbers evolve according to (1). Then we measure the distribution of both RNAs among the population, when it has reached a steady state . According to Itô’s formula, the steady state should satisfy the following steady Fokker-Planck equation

(2)

where the Fokker-Planck operator is given by

(3)

Since we do not model the protein production stage, we assume that the observed distribution of gene expression level is proportional to the marginal distribution of mRNA, i.e.

By integration of (2) in the variable, satisfies the equation

(4)

The quantity is the conditional expectation of the number of RNA within the population in the presence of molecules of mRNA and it is given by

(5)

2.2. Dynamics of free mRNA without binding

In the case where there is no RNA binding, namely when , the variables and are independent. Thus, the densitites of the invariant measures satisfying (2) are of the form

where is the density of the marginal distrubution of RNA. From the modelling point of view, it corresponds to the case where there is no feed-forward loop from RNA. Therefore, only the dynamics on mRNA, and thus , is of interest in our study. It satisfies the following steady Fokker-Planck equation obtained directly from (4),

(6)

It can be solved explicitly as we will discuss in Section 3.2.

2.3. Dynamics with binding and fast Rna

The Fokker-Planck equation (2) cannot be solved explicitly. However, one can make some additional assumptions in order to get an explicit invariant measure providing some insight into the influence of the binding mechanism with RNA. This is the purpose of the model considered hereafter.

Let us assume the RNA-mRNA binding rate, the RNA decay and the noise on RNA are large. Since the sink term of the RNA equation is large, it is also natural to assume that the RNA content is small. Mathematically, we assume the following scaling

for some small constant . Then satisfies

whose corresponding steady Fokker-Planck equation for the invariant measure then writes, dropping the tilde,

In the limit case where , one may expect that at least formally, the density converges to a limit density satisfying

As is only a parameter in the previous equation and since the first marginal of still satisfies (4) for all , one should have (formally)

(7)

3. Well-posedness of the models and analytical formulas for solutions

In this section, we show that the three previous models are well-posed. For the Fokker-Planck equations (6) and (7), we explicitely compute the solutions. They involve inverse gamma distributions.

3.1. Gamma and inverse gamma distributions

The expressions of the gamma and inverse gamma probability densities are respectively

(8)

and

(9)

for . The normalization constant is given by where is the Gamma function. Observe that by the change of variable one has

which justifies the terminology. Let us also recall that the first and second moments of the inverse gamma distribution are

(10)
(11)

Interestingly enough, we can show (see Appendix A for details and additional results) that inverse gamma distributions with finite first moment () satisfy a (weighted) Poincaré inequality. The proof of the following proposition is done in Appendix A among more general considerations.

Proposition 3.1.

Let and . Then, for any function such that the integrals make sense, one has

(12)

where for any probability density and any function on , the notation denotes .

3.2. Explicit mRNA distribution without binding

In the case of free mRNAs, a solution to (6) can be computed explicitely and takes the form of an inverse gamma distribution.

Lemma 3.2.

The following inverse gamma distribution

(13)

is the only classical solution to (6).

Proof.

First observe that

Therefore a solution of (8) must be of the form

for some constants . The first term decays like at infinity, thus the only probability density of this form is obtained for and .

The Poincaré inequality (12) tells us that the solution of Proposition 3.2 is also the only (variational) solution in the appropriate weighted Sobolev space. Indeed, we may introduce the natural Hilbert space associated with Equation (6),

with a squared norm given by

Then the following uniqueness result holds.

Lemma 3.3.

The classical solution is the only solution of (6) in .

Proof.

If and are two solutions of (6), a straightforward consequence of (12) is that . This is obtained by integrating the difference between the equation on and against . ∎

Another consequence of the Poincaré inequality is that if we consider the time evolution associated with the equation (6) then solutions converge exponentially fast towards the steady state . This justifies our focus on the stationary equations. The transient regime is very short and equilibrium is reached quickly. We can quantify the rate of convergence in terms of the parameters.

Proposition 3.4.

Let solve the Fokker-Planck equation

starting from the probability density . Then for all ,

Proof.

Observe that solves the unsteady Fokker-Planck equation, so that by multiplying the equation by and integrating in one gets

Then by using the Poincaré inequality (12) and a Gronwall type argument, one gets the result. ∎

3.3. Explicit mRNA distribution in the presence of fast Rna

Now we focus on the resolution of (7). The same arguments than those establishing Lemma 3.3 show that the only function satisfying (7) is the following inverse gamma distribution

(14)

Then an application of (10) yields

(15)

It remains to find which is a probability density solving the Fokker-Planck equation

Arguing as in the proof of Lemma 3.2, one observe that integrability properties force to actually solve

which yields

(16)

where is a normalizing constant making a probability density function.

3.4. Well-posedness of the main Fokker-Planck model

Now we are interested in the well-posedness of (2), for which we cannot derive explicit formulas anymore. Despite the convenient functional framework introduced in Section 3.2

, classical arguments from elliptic partial differential equation theory do not seem to be adaptable to the case

. The main obstruction comes from an incompatibility between the natural decay of functions in the space and the rapid growth of the term when .

However, thanks to probabilistic methods detailed in [21] and focused specifically on Fokker-Planck equations, we are able to prove well-posedness of the steady Fokker-Planck equation (2). The method is based on finding a Lyapunov function for the adjoint of the Fokker-Planck operator and relies on an integral identity proved by the same authors in [20].

First of all let us specify the notion of solution. A weak solution to (2) is an integrable function such that

(17)

where the adjoint operator is given by

(18)

A reformulation and combination of [21, Theorem A and Proposition 2.1] provides the following result.

Proposition 3.5 ([21]).

Assume that there is a smooth function , called Lyapunov function with respect to , such that

(19)

and

(20)

where . Then there is a unique satisfying (17). Moreover .

Lemma 3.6.

Choose any two constants and . Then, the function defined by

is a Lyapunov function with respect to (i.e. it is positive on and it satisfies (19) and (20)).

Proof.

First observe that condition (19) is clearly satisfied. Also, is minimal at where it takes the value and thus it is positive on . Finally a direct computation yields

and (20) follows. ∎

A combination of Proposition 3.5 and Lemma 3.6 provides the following result.

Proposition 3.7.

There is a unique weak solution to the steady Fokker-Planck equation (2). Moreover, .

4. Noise reduction by binding : the case of fast Rna

In this section we focus on the comparison between the explicit distributions (13) and (16). We are providing theoretical and numerical evidences that the coefficient of variation (which is a normalized standard deviation) of (16) is less than that of (13). This quantity called cell to cell variation in the biological literature [25] characterizes the robustness of the gene expression level (the lower the better). We start by performing a rescaling in order to extract the dimensionless parameters which characterize the distributions.

4.1. Dimensional analysis

In order to identify the parameters of importance in the models, we rescale the variable around a characteristic value chosen to be

(21)

This choice is natural in the sense that it corresponds to the steady state of the mRNA dynamics without binding nor stochastic effects, that is . It is also the expectation of . By rescaling into both and are rescaled into dimensionless densities

(22)
(23)

where and are normalizing constants depending on the parameters of the model and , and are dimensionless parameters. The first parameter

(24)

only depends on constants that are independent of the dynamics of RNAs. The two other dimensionless parameters are

(25)

and

(26)

Let us give some insight into the biological meaning of these parameters. The parameter measures the relative importance of the two mechanisms of destruction of RNAs, namely the binding with mRNAs versus the natural destruction/consumption. A large means that the binding effect is strong and conversely. The parameter compares the production rate of RNAs with that of mRNAs. Large values of mean that there are much more RNAs than mRNAs produced per unit of time.

4.2. Cell to cell variation (CV)

For any integrable non-negative function , let us denote by

its -th moment. The coefficient of variation or cell to cell variation (CV) is defined by

(27)

where and

denote the expectation and variance. One has the following result.

Proposition 4.1.

Consider the dimensionless distributions defined in (22) and (23). Then one has that

(28)

where the variance and coefficient of variation are well-defined only for . Moreover, for all , one has

and

Thus for all and one has

Finally one has the bound

(29)

which holds for all , and . Observe that but asymptotically

Proof.

The formulas for the moments follow from (10) and (11) and the limits can be taken using dominated convergence. The bound (29) is a consequence of the Prékopa-Leindler inequality (see [13] and references therein) which states that if are three functions satisfying for some and for all ,

(30)

then

(31)

We use it with , if and if , and . The condition (30) is then equivalent to

which is satisfied as the term between brackets is always greater than and the function , is bounded from below by , where is given in (29). Then with the change of variable in the integrals of (31), one recovers (29). ∎

The bound (29) does not confirm at this point that , which is the theoretical result one would hope for. However observe that is fairly close to for large . In the next section we provide numerical evidences that it should be possible to improve (29) to .

4.3. Exploration of the parameter space

Now, we explore the space of parameters in order to compare the cell to cell variation in the case of fast RNA and in the case of free mRNA.

In order to evaluate numerically the cell to cell variation we need to compute , for . Observe that after a change of variable these quantities can be rewritten (up to an explicit multiplicative constant depending on parameters)

with . For the numerical computation of these integrals, we use a Gauss-Laguerre quadrature

which is natural and efficient as we are dealing with functions integrated against a gamma distribution. We refer to [24] and references therein for the definition of the coefficients and quadrature points . The truncation order is chosen such that the numerical error between the approximation at order and is inferior to the given precision when . For , the function may take large values and it is harder to get the same numerical precision. In the numerical results below the mean error for the chosen sets of parameters with large values of is around and the maximal error is . This is good enough to comment on qualitative behavior.

We plot the relative cell to cell variation with respect to and for two different values of . The results are displayed on Figure 1. Then, on Figure 2, we draw the explicit distributions for various sets of parameters and compare it with .

Figure 1. Exploration of the parameter space. Relative cell to cell variation for various parameters , and . On the horizontal axis, left means more production of mRNA and right means more production of RNA; On the vertical axis, top means more destruction of mRNA by binding and bottom means more destruction/consumption of mRNA by other mecanisms

The numerical simulations of Figure 1 suggest that the bound (29) is non-optimal. Actually, we conjecture that for all , and one has

(32)

Observe that from Proposition 4.1 we know that for the inequality becomes an equality when and . For it saturates when and we conjecture that the coefficient of variation tends to when tends to infinity.

From a modeling point of view, these simulations confirm that for any choice of parameter, the presence of (fast) RNA makes the cell to cell variation decrease compared to the case without RNA. Moreover, the qualitative behavior with respect to the parameters makes sense. Indeed we observe that whenever enough RNA is produced (), the increase of the binding phenomenon () makes the cell to cell variation decay drastically.

Figure 2. Marginal distributions of mRNAs for fast RNAs compared to the free mRNA distribution (black solid curve) for different parameters and . Left: , and varies. Right: , and varies.

5. Noise reduction by binding for the main Fokker-Planck model: numerical evidences

In this section, we compute the gene expression level of the main model described by equation (2). In this case, as there is no explicit formula for the solution, we will compute an approximation of it using a discretization of the Fokker-Planck equation. In order to compute the solution in practice, we restrict the domain to the bounded domain . Because of the truncation, we add zero-flux boundary conditions in order to keep a conservative equation. It leads to the problem

(33)

5.1. Dimensional analysis and reformulation of the equation

In order for the numerical scheme to be more robust with respect to the size of the parameters, we start by rewriting the equation in a dimensionless version. It will also allow for comparisons with numerical experiments of the previous sections.

We introduce

where the characteristic numbers of mRNA and RNA are, as before respectively defined by

After some computations one obtains that equation (33) can be rewritten

(34)

with the corresponding no-flux boundary conditions and normalization. The parameters , and are those of the previous sections and the two new parameters are

(35)

and

(36)

The parameter compares consumption of RNA versus that of mRNA by mechanisms which are not the binding between the two RNAs. The parameter compares the amplitude of the noise in the dynamics of RNA versus that of the mRNA.

Remark 5.1.

Observe that the approximation of fast RNA leading to the model discussed in Section 2.3 and in Section 4 in its dimensionless form amounts to taking and let tend to .

As the coefficients in the advection and diffusion parts of (34) grow rapidly in , and degenerate when and , an efficient numerical resolution of (34) is not straightforward. Moreover a desirable feature of the scheme would be a preservation of the analytically known solution corresponding to . Because of these considerations we will discretize a reformulated version of the equation in which the underlying inverse gamma distributions explicitly appear. It will allow for a better numerical approximation when and are either close to or large. The reformulation is the following

(37)

with the associated no-flux boundary conditions and where the functions and are given by

(38)

and

(39)

5.2. Presentation of the numerical scheme

We use a discretization based this reformulation (37). It is inspired by [8] and is fairly close to the so-called Chang-Cooper scheme [15].

We use a finite-volume scheme. The rectangle is discretized with a structured regular mesh of size and in each respective direction. The centers of the control volumes are the points with and for and . We also introduce the intermediate points with and with defined w