 # Identifying stochastic governing equations from data of the most probable transition trajectories

Extracting the governing stochastic differential equation model from elusive data is crucial to understand and forecast dynamics for various systems. We devise a method to extract the drift term and estimate the diffusion coefficient of a governing stochastic dynamical system, from its time-series data for the most probable transition trajectory. By the Onsager-Machlup theory, the most probable transition trajectory satisfies the corresponding Euler-Lagrange equation, which is a second order deterministic ordinary differential equation involving the drift term and diffusion coefficient. We first estimate the coefficients of the Euler-Lagrange equation based on the data of the most probable trajectory, and then we calculate the drift and diffusion coefficient of the governing stochastic dynamical system. These two steps involve sparse regression and optimization. We finally illustrate our method with an example.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Discovering relationships between the observable features and responses is a regression problem of supervised learning. We formulate the response variable as a deterministic expression of the features, i.e. linear, polynomial, logarithmic, together with a random error with zero mean and independent of the features



. The physical dynamics is usually governed by a ordinary or partial differential equation. With time-series observations, Brunton et al. and Rudy et al.

[1, 17] devised a method to find the governing equations, by a sparse regression algorithm.

To determine a governing equation, we face two significant issues — selecting appropriate measurable features as basis and sparsing the coefficients of the features . There is no universal basis that performs well for all the cases. However, based on the theories of Taylor expansion and trigonometric expansion, polynomial and trigonometric terms are naturally selected as features [1, 17]. The remaining sparse regression is aimed at making some or more coefficients zero to eliminate the uncorrelated features and prevent overfitting caused by minimizing least-square loss. The

norm, which counts the number of the non-zero coefficients, is naturally added to the loss function as a penalty. However, it makes the loss function nonconvex and the optimization problem becomes intractable to solve

. Consequently, the penalty is then relaxed to the less shrinkage but resolvable

(Ridge) regression or the

norm (Lasso) [4, 20]. Comparatively, is sparser than . In order to be closer proximate the sparsity of the norm, some properties and iterative algorithms of the nonconvex norm are studied [5, 23, 26].

The observations are unavoidably contaminated by external noise, no matter the data are for the response variables  or for time-series trajectories of processes [1, 17]. Facing the undesirable noise, one way [1, 17] is to filter it, and employe a total variation regularization to denoise the derivatives and then construct a data matrix. However, stronger noise would generally result in greater deviation for the sparse solution, a better penalty is employed or a logical sparse regression framework is proposed, for example Sparse Identification of Nonlinear Dynamics (SINDy) [1, 17], which incessantly vanish the estimators with absolute values smaller than a threshold, and reestimate the non-zero coefficients until a stable solution is reached. The threshold is selected by cross validation as a value minimizing the loss function.

Without denoising, we here attempt to extract a governing stochastic differential equation with the drift term and diffusion coefficient, from a time-series observation for its most probable transition trajectory. Here, the additive diffusion term characterizes multiple unmeasurable external environment influences. Practically, the evolutionary history of transition pathways or the most probable transition pathways may be obtained in different fields. For example, the metabolic pathways on a prokaryotic genome , and the primary absorbance and fluorescence signals for gene expression . Technically, the one-dimensional diffusion has also been predicted by extension (time) domain averages, which match the dominant path shape for transition pathways 

. The most probable transition trajectory, as substitution of all the pathways in the state space, has been used to approximate the transition probability density function

. The most probable pathways between different brain regions have been simulated by a probabilistic fiber tractography algorithm, called ConTrack algorithm, from measurable pathways . Theoretically, a statistic for a process of a stochastic differential equation satisfies a determined differential equation . The most probable transition trajectory in a finite time interval with fixed boundary points, of a diffusion process (e.g., a solution of a stochastic differential equation) can be characterized by a second-order ordinary differential equation (ODE) with boundary conditions . This is attained by expressing the transition probability of the diffusion process as an integral, whose integrand is called Onsager-Machlup (OM) function. And then a variational principle reveals that the OM function satisfies the Euler-Lagrange equation, which furthermore leads to a desired second-order ODE. The corresponding boundary value problem is numerically solved by a shooting method [11, 12]. The OM function and most probable transition pathway for a jump-diffusion process has been studied recently by Chao and Duan .

In this present paper, we first present the background of OM function for a diffusion system, and the Euler-Lagrange equation for the most probable transition trajectory, i.e., the second-order ODE with boundary conditions, in section 2. Then in section 3.1, we devise a method to determine the drift and diffusion coefficient for a governing stochastic differential equation, with the observation data on the most probable transition trajectory. To this end, we need a function basis for the drift, and we pick a polynomial basis up to fifth order (or any finite order), together with an appropriate cost with penalty. This optimization problem is set up in section 3.2. We devise an algorithm to recursively estimate the coefficients in the diffusion model by feat of the covnex optimization package cvx in Matlab, and then sparse the estimators by vanishing the ones with absolute value smaller than a threshold, in section 3.3. In section 4, we make some applications and present the results.

## 2 The Onsager-Machlup function and the most probable transition trajectory

We start from a scalar stochastic differential equation (SDE)  as the model for the governing stochastic law

 dx(t) = f(x(t))dt+εdBt, (2.1)

where is a Brownian motion defined on a sample space with probability , and is the noise intensity (i.e., diffusion coefficient). The random sample trajectories satisfy the two-point boundary conditions

 x(0)=x0∈R,x(T)=xT∈R,

with a finite transition time .

The most probable transition trajectory, denoted by , for this stochastic differential equation, from state to within a time period , is the pathway which maximizes the Onsager-Machlup (OM) action functional, . Here the integrand (like a Lagrangian function in classical mechanics) is known as [7, 15, 21]:

 OM(˙z,z) = (f(z)−˙zε)2+f′(z). (2.2)

We recall the notions and . By the variational principle, we obtain the Euler-Lagrange equation  to be satisfied by the most probable transition trajectory :

 ddt∂OM(˙z,z)∂˙z = ∂OM(˙z,z)∂z. (2.3)

Thus, we get a second order ODE for the most probable transition trajectory , with two-point boundary conditions

 ¨z(t) = ε22f′′(z(t))+f′(z(t))f(z(t)),z(0)=x0,z(T)=xT. (2.4)

As discussed in the previous section, the most probable transition trajectory is sometimes observable. With the observed most probable transition trajectory, we will extract the underlying governing stochastic model in the next section. That is, we will determine the drift and the diffusion coefficient .

## 3 Sparse identification of the governing stochastic differential equation

### 3.1 The equation for the most probable transition trajectory

Based on a time-series data collected with time step , that is, , we want to identify the drift term and the diffusion coefficient .

We construct a basis library consisting of polynomial of observational feature, i.e. . To illustrate our method, we take . The drift term can then be expressed as

 f(x)=β0+β1x+β2x2+β3x3+β4x4+β5x5. (3.1)

Putting this expression into (2.4), we then have the second-order ODE

 ¨z = b0+b1z+b2z2+b3z3+⋯+b9z9 (3.2)

with coefficients

 b0 = β0β1+ε2β2,b5=6β1β5+6β2β4+3β23, b1 = 2β0β2+β21+3ε2β3,b6=7β2β5+7β3β4, b2 = 3β0β3+3β1β2+6ε2β4,b7=8β3β5+4β24, b3 = 4β0β4+4β1β3+2β22+10ε2β5,b8=9β4β5, b4 = 5β0β5+5β1β4+5β2β3,b9=5β25.

Denote the vectors

, , where means transposition, and the r.h.s of the above equalities as , . Additionally, denote . Then the coefficient vector in (3.1) is nonlinear related to the coefficients in (3.2), and we indicate this fact by

 b=G(β,ε). (3.3)

### 3.2 The optimization problem

We use simulated data for the most probable transition trajectory . That is, we collect a time history of observation with time step , and denote , with , and . Taking center-difference to approximate the second-order accurate solution of , we get a series of , with , and , . Each pair of data , satisfies the ODE (3.2), that is

 ¨zobi=b0+b1zobi+b2(zobi)2+b3(zobi)3+⋯+b9(zobi)9. (3.4)

Define

 ¨Z=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝¨zob1¨zob2⋮¨zobN+1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠ and Θ=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1zob1(zob1)2⋯(zob1)91zob2(zob2)2⋯(zob2)9⋮⋮⋮⋱⋮1zobN+1(zobN+1)2⋯(zobN+1)9⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (3.5)

The matrix formulation for the ‘unknown’ is

 ¨Z = Θb. (3.6)

This is an overdetermined system consists of equalities and 10 unknown coefficients.

To determine the SDE system, we need to distill the sparse coefficients and determine the diffusion coefficient . Starting from the most probable trajectory satisfying (2.4), we initially estimate the coefficients in (3.6), and then work out and from the relationship (3.3).

Finally, the sparse problem is

 min||β||0 s.t.{¨Z=Θb,b=G(β,ε). (3.7)

For , we denote the norm , the norm and the norm the number of non-zero . The sparse problem can be transformed into the following regularized least-square problem with an penalty 

 minβ,ε,bκ1||¨Z−Θb||22+κ2||b−G(β,ε)||22+κ3||β||0, (3.8)

with , , and small . The first two terms in this optimization problem is a weighted sum of the inconsistency of the two constraint functions in (3.2). To make the optimization problem solvable, we relax in (3.8) to , and take

 κ1||¨Z−Θb||22+κ2||b−G(β,ε)||22+κ3||β||1 (3.9)

as loss function.

### 3.3 An implementable algorithm for the optimization problem

Note that the first constraint condition in (3.2) is linear with respect to , and the second constraint condition is linear with respect to but nonlinear with respect to . For linear system , the analytical solution of the least-square problem is generally used as optimization solution. But the backslash, i.e. is more prone to result in sparse solution .

Our destination is sparse , and estimate which are nonlinear related to . As appears linearly in the first two terms of (3.8), while and present nonlinearly in the latter two terms, we attempt to split the loss functions into two parts. We numerically estimate and (, ), respectively, and update the estimators cyclically . Totally, we take an initial estimation and cycle an improvement to implement this procedure.

In the initial estimation step, we leverage as an estimator of , and then compute the real values and one by one from . Note that, the error of results in that of and . Moreover, not all the 10 equalities are used to get and , those not used equalities bring about differences between and . The comparatively bigger difference means a reduced confidence for the estimators, especially for and with small .

In the improvement step, we update by minimizing . This is implemented by the convex optimization package cvx. We then solve for real numbers and with the updated . Note that we have added a penalty term in the estimation of , as the sparsity of , especially the nonexistent higher order terms, causes that of .

We loop the improvement step until a stable solution is reached. For each updated estimator, we manage with sequence threshold as in [1, 17]. The threshold parameters and the weight parameters are chosen by cross validation. The exact procedures of the estimation are as follows:

1. For a set of and , we simulate a time-series data with equal time interval from (2.4) by shooting method, then approximate the time-series data of from these observations by center-difference, and get .

2. Construct a basis library of power functions , and develop the basis to get the matrix of time history as in (3.5).

3. Split the observations randomly to training part (80% data) and testing part (the left 20%). Accordingly, and are divided into , and , , respectively.

4. Get an initial estimator and then calculate the real values and one by one from .

5. Calculate the test error of the loss function .

6. Set , and .

7. Calculate , and then let if . Leverage cvx package to update nonzero which minimizes .

8. Work out and from (3.3), and let if .

9. Loop steps 3b and 3c, until the last two set of estimators are close enough. Denote the estimators as , and .

10. Calculate the test error as in step 2b with the estimators and the test data, and compare it with . If , increase by a step and replace the best estimators, else take back two steps of , half the step and put forward a new step.

11. Loop steps 3-5, find the best and minimizing the loss function and take the corresponding results as estimators.

In this algorithm, we make preparations in step 1, get initial estimators in step 2, and loop step 3 to improve the estimators with weight parameters , , , thresholds and . Recall that our destination is getting sparse and estimate , while the coefficients in (3.2) are the results of and as in (3.3). In step 3b, we vanish the absolute small coefficients of the calculated , update the nonzero s with the estimated , and the aspiration of sparsing . As results from and , it can be close to but not 0, we update the threshold and compare the test errors result from different thresholds in step 5 in order to determine a more preciser . We compare the test error with different weights and in step 6, and take the results with minimum test error as the estimators.

## 4 Numerical experiments

Example

We consider to reconstruct the stochastic differential equation

 dXt=f(Xt) dt+ε dBt,X(0)=0,X(1)=√2, (4.1)

where the drift and diffusion coefficient . We are given the observation (simulated data) of the most probable transition trajectory , which theoretically satisfies the Euler-Lagrange equation

 {¨z=ε22(2β2+6β3z)+(β1+2β2z+3β3z2)(β0+β1z+β2z2+β3z3),z(0)=0,z(1)=√2. (4.2)

Given a group of true values as in Table 1, we can simulate a most probable trajectory from the 2nd-order ODE system (4.2) by shooting method, and get a time-series of observations , with , , and . Based on the time-series data, we reversely reconstruct the system (4.1), namely, discover the drift and determine the diffusion coefficient .

We initially assume to be a higher-order polynomial, such as in (3.1). Starting from the linear system (3.2) and combining with (3.3), we extrapolate the sparse and estimate according to the algorithm in section 3.3. Some groups of true values and the corresponding estimators are illustrated in Table 1.

For this black box problem, the small test error with and , listed in Table 2, gives us confidence to the estimators. Note that, are real values, we firstly get or if , otherwise, . The error in the estimator causes that of . And then get from if , otherwise, from . The error of results from that of , or together with that of . We similarly solve out real values one by one from in (3.3), with takes value successively. The latter solved out real value and have more error sources. The value of should at last make . In fact, not all the 10 equalities have been used to solve out the 7 coefficients, the unused equalities make the error . The bigger means errors in the estimators, and it reduces our confidence to and solved out latter, which have more error sources. The relatively bigger in the 6th group and 2nd group indicate relatively bigger error of than in the other groups. Moreover, the estimator may also not as good as in the other groups.

## 5 Discussion

Based on the theoretical relationship between the SDE governing system and the Euler-Lagrange equation (a second order ODE) for its most probable transition trajectory, we have devised an algorithm to discover the drift term and the diffusion coefficient of the SDE, from a time-series data for the most probable transition trajectory.

We have employed polynomials as the basis of the drift term owning to the fact that a function can be approximated by a Taylor expansion. For more complicated cases, the trigonometric function may be added to the basis polynomials [1, 17]. Moreover, for a governing stochastic model with possibly rational drift, we may multiply a polynomial on the both sides of the model and convert the rational drift to a polynomial drift 

. Nevertheless, there is not a one-size-fit-all basis in data science.

Anyway, once the basis is selected, the following work is recursively estimating the coefficients in the system, and then eliminating the uncorrelated features by vanishing the estimators close to 0. The test error can be used to measure the accuracy of the estimators. In our estimation, we employ part of the loss function involving or to estimate and improve the corresponding coefficients. Noting that the sparsity of causes that of , we have added an penalty of in the loss function to improve the estimation of . Nevertheless, though we want to sparse , we can only solve from one by one, as they are nonlinear related to , the adding of a penalty of can complicate the solving process more than lead to a sparse solution. We choose a threshold to vanish the absolute small elements in by cross validation, as in fact should be , and true can result in very small elements of . We fix a small threshold to sparse the coefficients owning to the fact that the estimator is not that sensitive to the threshold as that for .

## Acknowledgements

This work was partially supported by NSFC grants 11531006, 11601491 and 11771449.

## References

•  S. L. Brunton, J. L. Proctor and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113 3932-3937 (2016).
•  X. Chang, Z. Xu, H. Zhang, J. Wang and Y. Liang, Rubust regularization theory based on regularization: the asymptotic distribution and variable selection consistence of solutions. Sci. Sin. Math. 40(10) 985-998 (2010).
•  Y. Chao and J. Duan, The Onsager-Machlup function as Lagrangian for the most probable path of a jump-diffusion process. Nonlinearity 32 3715-3741 (2019).
•  S. Chen, D. L. Donoho and M. A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1) 33-61 (2006).
•  X. Chen, D. Yang, Q. Zhang and J. Liang, regulariztion based numerical method for effective reconstruction of bioluminescence tomography. J. Appl. Phys. 115 184702 (2014).
•  J. Duan, An Introduction to Stochastic Dynamics. Cambridge University Press, 2015.
•  D. Drr and A. Bach, The Onsager-Machlup function as lagrangian for the most probable path of a diffusion process. Commun. Math. Phys. 60 153-170 (1978).
•  T. Hadtie, R. Tibshirani and J. Friedman, The Elemnets of Statistical Learning. Springer, 2009.
•  N. Q. Hoffer, K. Neupane, A. G. T. Pyo and M. T. Woodside, Measuring the average shape of the transition paths during the folding of a single biological molecule. Proc. Natl. Acad. Sci. 116(17) 8125-8130 (2019).
•  W. Iwasaki and T. Takagi, Rapid pathway evolution facilitated by horizontal gene transfers across prokaryotic lineages. PLoS Genet. 5(3) e1000402 (2009).
•  R. Kress, Numerical analysis. Springer, 1998.
•  J. N. Kutz, Data-driven modeling scientific computation. Oxford University Press, 2013.
•  N. M. Mangan, S. L. Brunton, J. L. Proctor and J. N. Kutz, Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transcation on Molecular, Biological, and Multi-scale Communications 2(1) 52-63 (2016).
•  B. K. Natraajan, Sparse approximate solutions to liner systems. SIAM J. Comput. 24(2) 227-234 (1995).
•  L. Onsager and S. Machlup, Fluctuations and irreversible processes. Phys. Rev. 91(6) 1505-1512 (1953).
•  A. F. Psaros, I. A. Kougioumtzoglou and I. Petromichelakis, Sparse representations and compressive sampling for enhancing the computational efficiency of the Wiener path integral technique. Mech. Syst. Signal Pr. 111 87-101 (2018).
•  S. H. Rudy, S. L. Brunton, J. L. Proctor and J. N. Kutz, Data-driven discovery of partial differential equations. Sci. Adv. 3 e1602614 (2017).
•  A. J. Sherbondy, R. F. Dougherty, M. Ben-Shachar, S. Napel and B. A. Wandell, ConTrack: Finding the most likely pathways between brain regions using diffusion tractography. Journal Vision 8(9):15 1-16 (2008).
•  D. Stefan, C. Pinel, S. Pinhal, E. Cinquemani, J. Geiselmann and H. de Jong, Inference of quantitative models of bacterial promoters from time-series reporter gene data. PLoS Comput. Biol. 11(1) e1004028 (2015).
•  R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58(1) 267-288 (1996).
•  L. Tisza and I. Manning, Fluctuations and irreversible thermodynamics. Phys. Rev. 105 1695-1705 (1957).
•  D. Wu, M. Fu and J. Duan, Discovering mean residence time and escape probability from data of stochastic dynamical systems. Chaos 29 093122 (2019).
•  Z. Xu, H. Zhang, Y. Wang and X. Chang, regualrization. Sci. China Inf. Sci. 53 1159-1169 (2010).
•  Z. Xu, X. Chang, F. Xu and H. Zhang, regualrization: A thresholding representation theory and a fast solver. IEEE Tran. Neural Netw. Learn. Syst. 23(7) 1013-1027 (2012).
•  O. Yair, R. Talmon, R. R. Coifman and I. G. Kevrekidis, Reconstruction of normal forms by learning informed observation geometries from data. Proc. Nat. Acad. Sci. 114(38) E7865-E7874 (2017).
•  J. Zeng, S. Lin, Y. Wang and Z. Xu, regularization: convergence iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9) 2317-2329 (2014).
•  P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz and A. Y. Aravkin, A unified framework for sparse relaxed regularized regression: SR3. IEEE Access 7 1404-1423 (2019).