1 Introduction
In the study of complex systems such as fluid dynamics, soft materials, biological evolution, etc., we often introduce empirical formulas and parameters into differential equations to model complex macroscopic phenomena induced by microscopic behaviors. These include for example, the subgrid stress in turbulence models, high order moment closure for moment equations, the drag coefficient which expressed as a function of porosity in particle fluid problem, and a variety of empirical formulas in biological and economical models. However, drawbacks of using empirical formulas and parameters to model such complex systems are that: 1) for systems with nonseparable scales, it is difficult to directly deduce accurate macroscopic equations based on microscopic mechanism; 2) models with fixed parameters cannot reflect the inherent randomness of the system, and hence can only crudely approximate the system in the average sense. Therefore, to address the above issues, we propose a machine learning framework to learn both the expression of differential equations and the distribution of the associated random parameters simultaneously.
1.1 Related Work
We start with a review of inverse problems for deterministic systems. Under the premise that the explicit form of the differential equation is known, the optimal parameters can be learned from the observed data by using classical system identification and tools from inverse problems. Several works lin2008learning ; liu2010learning ; long2018pde ; patel2018nonlinear solved a class of inverse problems under weaker assumptions based on the idea of unrolling dynamics of the numerical integration in the time direction. In raissi2018hidden , the authors proposed to use neural network as a surrogate to learn optimal parameters. When the explicit formula of the equation is unknown, brunton2016discovering constructed a dictionary consisted of simple terms that are likely to appear in the equations and employ sparse regression methods to select candidates for the expression of the equations. A series of works bongard2007automated ; schmidt2009distilling ; xu2020dlga
used genetic algorithms to discover the underlying terms of the differential equations. In
long2019pde , the authors proposed a deep symbolic neural network, called SymNet, to estimate the unknown expression of the differential equations, which has a relatively low memory requirement and computational complexity in many cases.For problems with randomness, uncertainty quantification methods smith2013uncertainty are often adopted to study forward uncertainty propagation (e.g., liu1986probabilistic ; zhang2001stochastic ; ghanem2003stochastic ; xiu2010numerical ). However, it is generally much more difficult to study inverse problems with uncertainty than forward uncertainty propagation. In earlier work, kennedy2001bayesian proposed a 4steps modular Bayesian approach to calibrate parameters of models. In chen2019learning , neural networks were used to estimate the modes of KL expansion during the study of the forward and inverse problems of stochastic advectiondiffusionreaction equations. Based on PINN raissi2018hidden , yang2020b estimated the posterior of model parameters by Hamiltonian Monte Carlo and variational inference. The most related work to ours is yang2018physics , where the authors introduced a generative adversarial network (GAN) goodfellow2014generative ; arjovsky2017wasserstein ; gulrajani2017improved to estimate the distribution of data snapshots. These works further advanced the development of inverse uncertainty quantification. However, Bayesian based framework kennedy2001bayesian ; chen2019learning ; yang2020b
often suffers from the curse of dimensionality. Although the use of GAN by
yang2018physics can overcome the curse of dimensionality, it also requires a relatively strong prior knowledge on the differential equation to be identified. Furthermore, the GAN of yang2018physics were not specifically designed to estimate the distribution of the parameters of the differential equations.In this paper, we propose a new machine learning framework for inverse uncertainty quantification. We introduce a new model, called RODENet, to estimate random ordinary differential equations (RODEs) from observed data by combining SymNet long2019pde for system identification and GAN goodfellow2014generative ; arjovsky2017wasserstein ; gulrajani2017improved for parameters distribution estimation. The high expressive power of SymNet enables the proposed RODENet to assume only minor prior knowledge on the form of the RODE to be identified. A particular novelty of the propose model is that, unlike existing inverse uncertainty quantification methods, RODENet estimates the distribution of parameters and makes use of GAN as a data driven regularization.
We note that, using GAN as a data driven regularization is common in image restoration bora2017compressed ; shah2018solving ; ledig2017photo . For example, bora2017compressed ; shah2018solving put forward the idea that we could solve image restoration problems within the range of a well trained GAN. In ledig2017photo
the authors used discriminator as the regularization in image super resolution problems. These latest studies in computer vision inspired us to use GAN as a data driven regularization which is new to inverse uncertainty quantification.
The rest of the paper is organized as follows. In Section 2, we introduce the architecture of RODENet. Details on training and the loss functions are introduced in Section 3. Experiments are presented in Section 4, and we conclude the paper in Section 5.
2 The RODENet
Given a set of observed time series with being the dimension of the observable quantities, we aim to discover the governing RODE from the set of time series. We assume that the RODE to be discovered takes the following form:
(1) 
where is a
dimensional vector function and
is the random parameter of . The RODE describes an infinite set of ODEs determined by the distribution of the random vector. For each random realization of the random variable
, denoted as , we call the associated ODE as ODE. We note that the RODEs in this paper are different from those considered in han2017random ; neckel2013random . To obtain solutions with higher regularities, the latter considers system parameters as random processes (e.g. solution of SDEs) rather than random variables.Consider the data set , where is the number of realizations of ODE. In practice, the data set is measured from the real world. As a proof of concept and to validate the proposed RODENet, in our experiments in Section 4, we generate data by solving ODE using the fourth order RungeKutta method with different initial values .
Our proposed RODENet (Fig. 1) is designed to identify each individual ODE of {ODE } and to estimate the distribution of from at the same time. To achieve both objectives, the RODENet framework consists of two main components:

A set of SymNetbased ODENets which aim to estimate of each individual ODE;

A GAN to learn the distribution of .
The trained GAN represents the distribution of the RODE and is able to generate ODE instances ODE. Moreover, we note that the GAN also serves as a good regularization in the training of ODENets.
2.1 The ODENet Component
For each ODE, the associated ODENet is a reduced version of PDENet 2.0 long2019pde , which can be constructed through unrolled forward Euler numerical discretization. One block of the th component of the ODENet estimating ODE can be written as:
(2) 
where which is the initial value of ODE and is a network that takes a dimensional vector as input and has hidden layers, which can approximate polynomials of input variables and is able to output the expression of the equation through symbolic computations . In practice, we take . Here, are the trainable parameters of ODENet, and is an estimation of . Fig. 2 (a) shows the architecture of the ODENet.
2.2 The GAN Component
It is natural to apply GAN to to estimate the distribution of . However, it is difficult (and unnecessary) to take derivative on the symbolic computation . Therefore, in RODENet, we choose to use GAN to approximate the distribution of instead. By doing so, we can still provide a viable approximation of the distribution of by simply taking the symbolic computations after the estimation of the distribution of .
In RODENet, we adopt WGAN (Wasserstein GAN) goodfellow2014generative ; arjovsky2017wasserstein to draw the distribution of by solving the following constrained minimax problem:
(3) 
where is a discriminator and is the distribution implicitly defined by the generator : . Since the true data are not directly observable, the training samples are approximately generated by , i.e. the parameters of ODENets which can be learned from the observed data (Sec2.1). In RODENet, and
are neural networks with multiple fully connected hidden layers and nonlinear activation functions. The architecture of the generator and discriminator is shown in Fig.
2 (b). The dimension of both the output of the generator and the input of the discriminator are equal to the size of each , which is .3 Training
The training of RODENet has three stages which include two warmup stages and the alternating stage. In the first stage (warmup1), we train the ODENets from the observed data. Secondly (warmup2), we train the GAN using the estimated parameters of the ODENets. Thirdly (alternating stage), the GAN is used as a regularization to further improve the estimates on the parameters of the ODENets by updating the parameters of GAN and ODENets alternatively.
Warmup1:
Given each with , the parameters of the associated ODENet can be learned by solving the following minimization problem:
(4) 
Here, the data loss term describes the accuracy of the ODENet to match the data , where . The regularization term is defined as follows:
(5) 
This is known as the Huber function which is a smoothed version of the absolute value. The hyperparameters
are used in RODENet.During the training of the ODENets, we adopt a strategy to gradually increase the number of steps from 1 to . This strategy is to ensure the accuracy of longterm prediction and to keep the model stable during training. In practice we take . We use Adam kingma2014adam to minimize the above loss function with a learning rate of 0.01.
Warmup2:
The learned parameters in Warmup1 serve as training data for WGAN. To improve training of WGAN, we adopt the gradient penalty methods gulrajani2017improved , i.e. solving the following WGANGP problem which is a penalized version of the original WGAN problem:
(6) 
where we define by sampling uniformly along straight lines between and with . Moreover, is approximated by . We update and alternatively by optimizing the WGANGP loss (6) where we take .
Alternating stage:
In the warmup1 stage, each of the ODENet is estimated only using the data . This has two potential problems: 1) some instances of the underlying RODE may be harder to estimate than others; 2) the estimate of each ODE does not make use of the fact that it is an instance from a RODE. Therefore, to provide an estimate of the ODE instances with a uniform control of quality and to take advantage of the entire data set , we use GAN as a regularization and update the parameters of GAN and the ODENets alternatively.
In addition to the data and the Huber loss term introduced in (4), we introduce another loss term, GAN regularization: The regularization by GAN can be interpreted as a switching of the role of the generator and the ODENets, which enforces the parameters of the ODENets to be similar to what can be generated by the learned generator. The full loss used in this stage is
(7) 
where we take in practice.
The alternating update of the ODENets and GAN is a mixed procedure of warmup1 and warmup2, except that we have modified the loss of training ODENets. The training of the RODENet consists of this algorithm, together with warmup1 and warmup2, and is detailed in Algorithm 1.
4 Experiments
In this section, we show the effectiveness of the proposed RODENet on simulation data. We first describe how simulation data is generated. Then, we show how well the trained RODENet estimates the analytical form and the parameter distribution of the unknown RODE and generates reliable predictions.
4.1 Simulation Data
We select the following RODE to generate the observed data. The chosen RODE is a quadratic equation with three random parameters.
(8) 
The observed variable is a 3dimensional vector. We choose two different settings to describe the randomness of the parameters in this RODE:

RODE_ind: ;

RODE_dep: .
We sample groups of the random parameters to form the sampled ODEs from this RODE and randomly chosen initial points from the uniform cube for each ODE. For each initial point, we evolve the dynamics using fourth order RungeKutta method for time steps with step size . Moreover, noise is injected in the numerical solutions of these ODE instances by , with three different noise levels .
In this section, prediction error at time is defined as the relative error between the predicted data , which is generated by the trained RODENet with input , and the true data : .
4.2 The performance of RODENet
This subsection presents the performance of the learned RODENet in terms of prediction and identification of the random parameters.
To validate the performance of RODENet on prediction, we use the learned GAN to generate several ODEs and inspect whether they distribute similarly to those sampled from the true RODE. Fig. 3 shows the predicted trajectories of 500 ODEs generated by the RODENet (labeled as RODENet) and 500 sampled instances of the RODE (8) (labeled as RODE) using three randomly selected initial points. At each time step , we calculate the Euclidean distance between each trajectory and their mean and plot the 099% and 075% bands of the distance in Fig. 3 (ac). The bands of two groups generally overlap indicating that the distribution of the predictions of RODENet is similar to those of the RODE. This conclusion can also be validated by the rest of the plots in Fig. 3, where we can see the data points on the trajectories of the sampled RODENet and RODE are distributed in similar regions. The solid lines in Fig. 3(a1)(c2) represent the mean of the trajectories. The closeness of the two solid lines in each of these subfigure indicates that the bias between the predictions of the RODENet and that of the true RODE is relatively small.
We also compare the estimated distribution of the parameters from RODENet with those of the RODE by calculating the mean and standard deviation of the coefficients from 100 generated ODEs from RODENet. The comparisons for the case RODE_ind are shown in Table
1. We can see that the RODENet successfully differentiates between the terms with and without randomness. For the terms exist in the RODE, the mean and standard deviation estimated by the RODENet is close to those of the RODE, while the estimated coefficients that do not exist in the RODE have relatively small means and standard deviations.term  coefficients of  coefficients of  coefficients of  
RODENet  RODE_ind  RODENet  RODE_ind  RODENet  RODE_ind  
1  0.029(0.764)  0  0.069(0.527)  0  0.075(0.904)  0 
1.841(0.921)  2.000(1.000)  0.770(1.725)  1.000(2.000)  0.025(0.414)  0  
1.729(0.873)  2.000(1.000)  0.988(0.304)  1.000(0)  0.051(0.506)  0  
0.002(0.051)  0  0.007(0.140)  0  0.904(0.834)  1.000(1.000)  
0.035(0.027)  0  0.805(0.180)  1.000(0)  0.003(0.070)  0  
0.006(0.058)  0  0.035(0.267)  0  0.077(0.149)  0  
0.001(0.065)  0  0.020(0.198)  0  0.799(0.397)  1.000(0) 
estimated  true  estimated  true  estimated  true  
RODE_ind  0.075  0  0.019  0  0.061  0 
RODE_dep  1.659  2  0.900  1  1.700  2 
Moreover, we show that RODENet can also well identify RODE with a joint distribution of its parameters. Table
2 shows the covariance of the coefficients of 100 ODEs generated from RODENet. We use the coefficient of term of , of , of to estimate the random coefficients in (8). In both cases, the covariance of the coefficients of RODENet are close to the true value.4.3 The Regularization Effect of GAN in RODENet
In this subsection, we demonstrate the regularization effect of the trained GAN in improving the estimates of ODENets. We compare the performance of the RODENet (i.e. ODENets with GAN regularization trained by Algorithm 1) with the trained ODENets without GAN regularization (trained by warmup1 of Algorithm 1) and find that the GAN regularization helps in reducing prediction error and correcting the learned formulas of the ODEs. For convenience, we shall refer to the ODENets without GAN regularization simply as ODENets.
Let (resp. ) be the median of relative errors of the prediction of the th sample of the RODENet (resp. ODENets) at time among 100 initial points. We generate in total instances. Denote the error vectors with . We compare the histograms of and at in Fig. 4. For a better visual comparison, we cropped the vectors within the range . The numbers of ODE instances for the RODENet and ODENets with error are comparable, while the number of ODE instances for RODENet is smaller than that of the ODENets with error .
As one can see that, the GAN regularization of RODENet notably improves overall prediction errors in comparison with ODENets. For some ODE instance the improvement can be rather significant. In Fig. 5
, we plot the quantiled relative error vectors for a particular instance of the ODENets with and without GAN regularization.
GAN regularization in RODENet also helps with the estimation of the expression of the RODE. In Fig. 5(a), we compared the distribution of the errors associated to the ODENets with and without GAN regularization. As one can see that, GAN regularization indeed helps with the estimation of the expression of the RODE.
We note that some of the estimations by RODENet is notably more accurate than the ODENets without GAN regularization. An example is shown in the table in Fig.5(b). We can see that the estimated coefficients of the primary terms of the ODE are closer to the true values when trained with GAN regularization.

5 Conclusion and Future Work
In this paper, we proposed a new framework called RODENet to identify RODE from data. The RODENet is a combination of ODENet for system identification and GAN for parameter distribution estimation. Numerical experiments demonstrated that the trained RODENet was able to identify ODEs with minor prior knowledge on the dynamics, but was also able to well estimate the distribution of the RODE. This enabled us to simulate trajectories using the trained RODENet that distributed similarly to the ODEs generated from the true RODE. Moreover, the well trained GAN in the RODENet served as an effective regularization that enabled the RODENet to obtain more accurate predictions and parameter estimations compared to the ODENets without GAN regularization. One limitation and possible further extension of the RODENet is that, due to the high dimensionality of the parameters of the ODENet, one may include more prior knowledge to reduce the dimension. Furthermore, we would also like to apply the proposed framework to more practical examples, such as the particle fluid systems. It is also worth to study how to design a better data driven regularization based on or beyond GAN.
6 Broader Impact
The proposed RODENet differs from earlier works major in twofold and hence it may lead to further impact. 1) The proposed RODENet was used to estimate the parameters’ distribution of the RODE rather than the best set of parameters in average sense. While most classical methods focus on uncertainty forward propagation and parameter calibration, the RODENet may encourage us to investigate alternative ways to model systems with inherent randomness. 2) The learned GAN was further used as a data driven regularization. This approach can also be generalized to inverse problems of partial differential equations. It is worth exploring on how to use GAN to improve or even replace the commonly used empirical regularization methods. However, due to the nontransparency of deep neural networks, the learned generator of GAN and the corresponding data driven regularization is not always reliable. One should be cautious when deploying GAN based regularization in practice, especially for systems with low fault tolerance.
Zichao Long is supported by The Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking University. Bin Dong is supported in part by Beijing Natural Science Foundation (No. 180001) and Beijing Academy of Artificial Intelligence (BAAI).
References
 (1) Zichao Long, Yiping Lu, and Bin Dong. Pdenet 2.0: Learning pdes from data with a numericsymbolic hybrid deep network. Journal of Computational Physics, page 108925, 2019.
 (2) Zhouchen Lin, Wei Zhang, and Xiaoou Tang. Learning partial differential equations for computer vision. Peking Univ., Chin. Univ. of Hong Kong, 2008.
 (3) Risheng Liu, Zhouchen Lin, Wei Zhang, and Zhixun Su. Learning pdes for image restoration via optimal control. In European Conference on Computer Vision, pages 115–128. Springer, 2010.
 (4) Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. Pdenet: Learning pdes from data. In International Conference on Machine Learning, pages 3214–3222, 2018.
 (5) Ravi G Patel and Olivier Desjardins. Nonlinear integrodifferential operator regression with neural networks. arXiv preprint arXiv:1810.08552, 2018.
 (6) Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357:125–141, 2018.
 (7) Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, page 201517384, 2016.
 (8) Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104(24):9943–9948, 2007.
 (9) Michael Schmidt and Hod Lipson. Distilling freeform natural laws from experimental data. science, 324(5923):81–85, 2009.
 (10) Hao Xu, Haibin Chang, and Dongxiao Zhang. Dlgapde: Discovery of pdes with incomplete candidate library via combination of deep learning and genetic algorithm. arXiv preprint arXiv:2001.07305, 2020.
 (11) Ralph C Smith. Uncertainty quantification: theory, implementation, and applications, volume 12. Siam, 2013.
 (12) Wing Kam Liu, Ted Belytschko, and A Mani. Probabilistic finite elements for nonlinear structural dynamics. Computer Methods in Applied Mechanics and Engineering, 56(1):61–81, 1986.
 (13) Dongxiao Zhang. Stochastic methods for flow in porous media: coping with uncertainties. Elsevier, 2001.
 (14) Roger G Ghanem and Pol D Spanos. Stochastic finite elements: a spectral approach. Courier Corporation, 2003.
 (15) Dongbin Xiu. Numerical methods for stochastic computations: a spectral method approach. Princeton university press, 2010.
 (16) Marc C Kennedy and Anthony O’Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(3):425–464, 2001.
 (17) Xiaoli Chen, Jinqiao Duan, and George Em Karniadakis. Learning and metalearning of stochastic advectiondiffusionreaction systems from sparse measurements. arXiv preprint arXiv:1910.09098, 2019.
 (18) Liu Yang, Xuhui Meng, and George Em Karniadakis. Bpinns: Bayesian physicsinformed neural networks for forward and inverse pde problems with noisy data. arXiv preprint arXiv:2003.06097, 2020.
 (19) Liu Yang, Dongkun Zhang, and George Em Karniadakis. Physicsinformed generative adversarial networks for stochastic differential equations. SIAM Journal on Scientific Computing, 42(1):A292–A317, 2020.
 (20) Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 (21) Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 214–223, 2017.
 (22) Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in neural information processing systems, pages 5767–5777, 2017.
 (23) Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 537–546. JMLR. org, 2017.
 (24) Viraj Shah and Chinmay Hegde. Solving linear inverse problems using gan priors: An algorithm with provable guarantees. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4609–4613. IEEE, 2018.

(25)
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz,
Zehan Wang, et al.
Photorealistic single image superresolution using a generative
adversarial network.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 4681–4690, 2017.  (26) Xiaoying Han and Peter E Kloeden. Random ordinary differential equations and their numerical solution. Springer, 2017.
 (27) Tobias Neckel and Florian Rupp. Random differential equations in scientific computing. Walter de Gruyter, 2013.
 (28) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Comments
There are no comments yet.