### PSI-PDE

A robust method of learning PDEs from dynamical systems.

view repo

Robust physics (e.g., governing equations and laws) discovery is of great interest for many engineering fields and explainable machine learning. A critical challenge compared with general training is that the term and format of governing equations is not known as a prior. In addition, significant measurement noise and complex algorithm hyperparameter tuning usually reduces the robustness of existing methods. A robust data-driven method is proposed in this study for identifying the governing Partial Differential Equations (PDEs) of a given system from noisy data. The proposed method is based on the concept of Progressive Sparse Identification of PDEs (PSI-PDE or ψ-PDE). Special focus is on the handling of data with huge uncertainties (e.g., 50% noise level). Neural Network modeling and fast Fourier transform (FFT) are implemented to reduce the influence of noise in sparse regression. Following this, candidate terms from the prescribed library are progressively selected and added to the learned PDEs, which automatically promotes parsimony with respect to the number of terms in PDEs as well as their complexity. Next, the significance of each learned terms is further evaluated and the coefficients of PDE terms are optimized by minimizing the L2 residuals. Results of numerical case studies indicate that the governing PDEs of many canonical dynamical systems can be correctly identified using the proposed ψ-PDE method with highly noisy data. One great benefit of proposed algorithm is that it avoids complex algorithm modification and hyperparameter tuning in most existing methods. Limitations of the proposed method and major findings are presented.

READ FULL TEXT VIEW PDFA robust method of learning PDEs from dynamical systems.

view repo

Despite that many dynamical systems can be well characterized by PDEs derived mathematically/physically from basic principles such as conservation laws, lots of other systems have unclear or elusive underlying mechanisms (e.g., ones in neuroscience, finance, and ecology). Thus, the governing equations are usually empirically formulated rudy2017data. Data-driven physics discovery of dynamical systems gradually became possible in recent years due to the rapid development and extensive application of sensing technologies and computational power long2019pde. Over the past years, extensive efforts have been devoted into discovering representative PDEs for complex dynamical systems of which limited prior knowledge are available schmidt2009distilling; raissi2018hidden; rudy2017data; long2019pde.

Among all the methods investigated for PDE identification schmidt2009distilling; raissi2018hidden; rudy2017data; long2019pde; maslyaev1903data; atkinson2019data; hasan2020learning; xu2020dlga, sparse regression gains the most attention in recent studies due to its inherent parsimony-promoting advantage. Considering a nonlinear PDE of the general form , in which the subscripts denote partial differentiation with respect to temporal or spatial coordinate(s), is an unknown expression on the right hand side of the PDE. It is usually a nonlinear function of the spatial coordinate , the measured quantity , and its spatial derivatives ,, etc. Given time series measurements of at certain spatial locations, the above equation can be approximated as , in which is the discretized form of , is a library matrix with each column corresponding to a candidate term in . A key assumption in sparse identification is that consists of only a few term for a real physical system, which requires the solution of regression (i.e.,

) to be a sparse vector with only a limited number of nonzero elements. This assumption promotes a parsimonious form of the learned PDE instead of overfitting the measured data with a complex model containing redundant nonlinear higher-order terms.

As pioneering researchers in sparse PDE learning, Rudy et al. rudy2017data; rudy2019data

modified the ridge regression method by imposing hard thresholding which recursively eliminates certain terms with coefficient values below a learned threshold. As pointed out in Limitations of

rudy2017data; rudy2019data (Section 4 in Supplementary Materials) and following studies raissi2018hidden; chen2020deep; both2020sparsely, the identification quality is very sensitive to data quantity and quality. For example, the terms of the reaction diffusion equation cannot be correctly identified using the data with only 0.5% random noise. Furthermore, as indicated in fuentes2020equation, the identification results using this method are susceptible to the selection of hyperparameters of the algorithm, including the regularizer and the initial tolerance which is also the tolerance increment. The hyperparameter tuning is especially critical for cases with noisy measurements. This limitation most probably comes from the hard thresholding in the modified algorithm (STRidge). A hard thresholding tends to suppress small coefficients that may not correspond to the most trivial terms of the intermediately learned PDEs.

To overcome the challenge of numerical differentiation with scarce and noisy data in sparse regression methods, deep learning techniques were incorporated by generating a large quantity of meta-data and adopting the automatic differentiation function in deep learning frameworks (Tensorflow, PyTorch, etc.)

xu2019dl; berg2019data. The intermediately learned PDE can be treated as a physics loss term in physics-informed deep learning raissi2019physics; wang2020towards; haghighat2020sciann, and constrained neural networks were developed to improve the performance of PDE identification recursively hasan2020learning; chen2020deep; both2020sparsely. Long et al. long2018pde; long2019pde used a convolutional architecture and symbolic regression to replace the numerical differentiation and sparse regression procedures, respectively. A comprehensive review of the state of the art of PDE learning can be found in chen2020deep. Despite improved performance of PDE identification using these methods, the identification results (both PDE forms and coefficients) are lacking robustness in most studies mentioned above. For example, approaches using constrained neural networks introduced more hyperparameters into the algorithms in addition to those in the used STRidge algorithm, which further increases the challenge of identifying the correct PDE forms since PDE learning problems are sensitive to the hyperparameter tuning. This issue is amplified under noisy data, especially under high noise levels. Complete different identification results may be obtained under different noise levels using same hyperparameter settings. Thus, a sound robust PDE learning needs to produce stable identification results with respect to different noise levels.Considering the gaps of existing studies in discovering PDEs from complex dynamical systems, a robust method for correctly identifying PDEs is needed to discover the underlying physics of the measured systems that lack prior knowledge of the governing principles. Thus, this study attempts to develop a robust method of PDE identification within the framework of sparse regression. The key idea is to address both sparsity and accuracy of the learned PDE. Special focus is on the automatic and progressive selection of learned PDE forms without complex algorithms with hard-to-tune hyperparameters schmidt2009distilling. The proposed scheme automatically promotes sparsity in addition to simplicity of the learned model. Finally, the representativeness of each model will be further evaluated by solving its corresponding PDE with given/extracted initial and/or boundary conditions. The coefficients of each term are optimized by minimizing the error of model prediction with the measured data taken as the ground truth. In this way, the PDE that is most likely to represent the intrinsic mechanisms underlying the observed system will be determined. Since the proposed methodology progressively yields a sparse identification of the governing PDE(s) of a given system, it is named the progressive and sparse identification method of PDEs (PSI-PDE or -PDE method).

The remaining part of this paper is structured as follows. Section 2 establishes the framework of the -PDE method; section 3 presents and discusses the results of discovering govern equations for a variety of dynamical systems using the -PDE method; section 4 concludes this study with remarks and recommendations for future work.

Figure 1 illustrates the framework of the proposed -PDE method for discovering the governing equations using measured data from a dynamical system. This framework starts from the red noisy curve on the upper left corner, which denotes the measured signals containing all the information one can directly obtain from the instrumented system. For preprocessing the measured data, a neural network (NN) model is built following the practices in xu2019dl; berg2019data, setting the independent variables (i.e., , , etc.) as inputs and the measured quantity (e.g., ) as the output. The measured data are split into training and validation sets and the early stopping strategy is devised in the model training to prevent the NN model from overfitting the noise components contained in the measured data. A smoothed series of signals is expected from this preprocessing, which will be subsequently used to calculate the numerical derivatives (i.e., , , , etc.) and then construct the library matrix

for sparse regression. Numerical methods such as finite difference method (FMD) and polynomial interpolation are used to calculate the temporal and spatial derivatives. With

and established, to further reduce the influence of noise in sparse regression, fast Fourier transform (FFT) is applied to transform andto their frequency domain counterparts

and , respectively. Following FFT, a frequency cutoff is implemented to preserve only the low frequency components that are expected to be less susceptible to noise. Moreover, this step converts the regression problem from the temporal-spatial domain to the frequency domain, which does not change the form of learned PDEs cao2020machine.With and from FFT with frequency cutoff, sparse regression is conducted in a progressive manner. Algorithms that automatically yield a sparse solution by imposing norm regularization (such as LASSO) or hard thresholding (such as STRidge) are not adopted to avoid the lemma of hyperparameter tuning in PDE learning. Instead, a least squares regression is implemented via in MATLAB to recursively examine the importance of each term in the prescribed library by evaluating the resulting regression error and model complexity. In this way, the most important term(s) are step-by-step identified and added to the PDE model until the effects of adding more terms diminish. The details of this procedure are elaborated in the -algorithm in Algorithm 1. This algorithm probably yields more than one candidate PDE models that are hard to compare from the perspective of regression accuracy and model complexity. Finally, all candidate PDEs are solved numerically given sufficient initial/boundary conditions, and the solutions (the blue smooth curve at the upper left corner of Figure 1) are compared with the measured data. In this step, the importance of each term can be further evaluated by eliminating certain terms and optimizing their coefficients. The final PDE for the given system is determined as the one capturing the most intrinsic mechanism represented by the essential terms. The rest of this section demonstrates each step of the -PDE method using an example of discovering the Burgers equation from simulated data.

Burgers equation is used to describe the dynamics of a dissipative system. A 1D viscous Burgers equation is used to explain the steps of the -PDE method in detail. It has the expression of with the initial and boundary conditions and , in which denotes the diffusion coefficient. Figure 2 (a) shows its solution within the range and . The library for sparse regression is built with polynomials of to the power of 3, spatial derivatives to the order, and their products. As a result, it contains 16 terms in total, i.e., .

To demonstrate the effect of NN denoising step in the -PDE method, 10% white Gaussian noise is added to the numerical solution of the Burgers equation, which significantly varies the values of the solution (as shown in Figure 2

(b)) and poses challenge to calculating the numerical derivatives. In this study, noise level is quantified by the percentage of the standard deviation of the measured variable. For example, if 10% noise is added to

, then the outcome is where generates white Gaussian noise of the specified dimension. Without much prior knowledge about the noise characteristics, NN modeling is applied to denoise the noisy measurements, and the processed data is visualized in Figure 2 (c). Comparing the three plots in Figure 2, one can observe that NN denoising largely reduces the noise level in the collected data (from 10% to 2%) and makes the solution curve much smoother than the noisy one, which has the potential of improving the accuracy of subsequent numerical differentiation.With numerical derivatives calculated and the library matrix established, FFT is conducted to further reduce the influence of noise in the subsequent sparse regression. Figures 3 (a) to (d) compare the relative errors in the spatial-temporal domain and frequency domain after taking 2D FFT. Figures 3 (a) and (b) show the difference between the polluted data with 50% noise and the simulated clean data. It can be observed that after taking 2D FFT, the low-frequency components are less affected by the added noise. In addition, Figures 3 (c) and (d) show that NN denoising can largely reduces not only the relative error in the spatial-temporal domain, as can be predicted from Figure 2, but also the relative error in the frequency domain. Therefore, it can be expected that the performance of PDE learning can be considerably improved by implementing these preprocessing procedures.

Following the preprocessing steps that prepared the vector and the library matrix , progressive sparse identification of the PDE is conducted using the -algorithm presented in Algorithm 1. The form of the Burgers equation is identified progressively until when adding more terms to the learned PDE no longer considerably improves the regression accuracy. Without loss generality, the rest of this section uses the simulated clean data, and the results of cases with various levels of noise will be presented in section 3 with other systems. First, the most contributive term(s) with index/indices are determined by comparing the increase of regression errors ( and ) when dropping a certain term from the full list of terms in the library. Figures 4 and 5 (a) plot the distributions of errors( and respectively) when dropping each term in the library. It can be observed that deleting the term causes the largest regression error which is much larger than that when dropping any other term. This finding is further verified by comparing the mean errors as shown in Figures 6 and 7 (a). Hence, the term is first added to the PDE form with .

With selected, the importance of other terms in the library are examined by adding each of them separately into the selected list and evaluating the improvement of regression. The results are shown in Figures 4 and 5 (b) for the error distributions and Figures 6 and 7 (b) for the mean errors. Comparison shows that adding the term () or the term () can largely reduce the regression errors. Considering the complexity of the term with higher nonlinearity at the same order of spatial derivative, the term () is added to the PDE and the selected index list is updated to . The and terms are not added simultaneously in this step since adding one of them may affect the importance of adding the other term regarding improving the regression accuracy. However, all possible outcomes of sparse regression with the -algorithm will be investigated in the PDE solving/optimization step of the -PDE method, which reevaluates the importance of each possible terms and finalizes the learned PDE of the observed system.

The same procedure is run to evaluate the importance of other terms, and the results (Figures 4 to 7 (c)) show that adding more terms to the PDE does not significantly improve the performance of regression but increases the complexity of the resulting model. Therefore, the -algorithm for sparse regression will terminate with terms and selected to formulate the PDE for the given system, and the identified PDE in this step is with a least squares regression.

With its form determined in sparse regression using the -algorithm, the coefficients of each term of the identified PDE will be optimized in the last step of the -PDE method by solving it with given/extracted initial/boundary conditions and comparing the solution with the measured data. The steepest descent method is used in this optimization. Figure 8 (a) shows the resulting optimized PDE, its solution plotted together with the simulated data, and their difference .The optimized PDE has exactly the same form with the ground truth PDE and accurate coefficients of both terms on the right side, and its solution is very close to the simulated data with the largest error below 5%. Hence, the governing equation for the simulated system can be correctly identified using the proposed -PDE method.

Considering that the -algorithm does not necessarily yield a single solution especially in cases with noisy data, other possible solutions are analyzed to further investigate the importance of the selected/dropped terms as well as the robustness of the -PDE method. In the sparse regression step, after selecting the term , both the term and the term are competitive candidates in terms of reducing the regression errors. Figures 8 (b) and (c) shows the results of other possible solutions in the PDE solving/optimization step. In the second round in sparse regression, if the term is selected instead of , the -algorithm will end with the PDE . Figure 8 (b) shows the results of optimization. It can be observed that the solution of the identified PDE has considerably large error at around (>50%) though it matches well with the simulated data elsewhere. Additionally, one may suggest adding both and to the PDE, which yields the equation in sparse regression. Figure 8 (c) shows the results of optimization in this scenario. It shows that with the sacrifice of model parsimony, adding the term to the learned PDE does not largely reduce the difference of its solution from the simulated data. Moreover, one can observe that after optimization, the coefficient of the term becomes nearly ten times smaller, which further proves its insignificance in the governing equation of this system. This analysis with all candidate solutions of PDE learning ensures that the -PDE method finally yields an equation that best captures the intrinsic underlying physics among all candidate solutions.

This section presents and discusses the results of PDE learning using the -PDE method with simulated noisy data from several canonical systems covering a number of scientific domains. 0 to 50% noise is added to the numerically simulated clean data to demonstrate the effects of preprocessing and the robustness of the -PDE method in cases with noisy measurements. Two 1D systems are investigated in section 3.1: (1) the dissipative system characterized by the 1D Burgers equation (as shown in section 2)); (2) the traveling waves described by the Korteweg–de Vries (KdV) equation. To highlight the advantage of the -PDE method in robustness, section 3.1 first presents the results of learning the 1D Burgers equation using the PDE-FIND method developed in rudy2017data. The lemma of hyperparameter tunning and the challenge in cases with noisy data in existing methods are demonstrated. Section 3.2 presents the results of two 2D systems: (1) an extended dissipative system characterized by the 2D Burgers equation; (2) the lid-driven cavity flow corresponding to the 2D Navier Stokes equation. Due to the limited space in this article, the intermediate results are not presented (as done in section 2) for the cases studied in this section. Codes of demonstrated examples are available on the website: https://github.com/ymlasu.

Table 1 compares the results of learning the 1D Burgers equation () using the PDE-FIND method rudy2017data considering various noise levels and different combinations of hyperparameters. The STRidge algorithm in the PDE-FIND method mainly has two hyperparameters, i.e., the regularizer and the tolerance increment/ initial tolerance . From the authors’ viewpoint, the hyperparameter tuning can be challenging in PDE learning where the ground truths and noise level are not known a priori. Therefore, a method may not be robust enough if the learning outcome is susceptible to the variation of hyperparameters.

In cases with 0% or 10% noise, the PDE-FIND method yields identical PDE though with different hyperparameter combinations, as shown in the last column of Table 1. Comparing the results with the ground truth PDE, one can find that correct terms are identified. However, without the PDE solving/optimization procedure, their coefficients cannot be optimized in this method. When the noise level increases to 20%, the results become very sensitive to the variation of hyperparameters. With a different hyperparameter combination, the identified terms on the right hand side of PDE can be very different. Moreover, at this noise level, the terms of the ground truth PDE cannot be correctly identified though exhaustive trials of hyperparameter adjustment. Considering this lemma of hyperparameter tuning, especially in cases with a large level of noise, a robust method for PDE learning is needed when limited prior information is available for an unknown system.

Noise level | identified PDE | ||
---|---|---|---|

0% | - | - | |

10% | - | - | |

20% | 1.0 | ||

20% | 1.0 | ||

20% | 0.1 | ||

20% | 0.1 |

Table 2 lists the results of learning the Burgers equation () from noisy data using the proposed -PDE method. Without complex and tricky hyperparameter setting/tuning, the -PDE method yields identically correct PDE form even with data containing up to 50% noise. Moreover, by virtue of the critical PDE solving/optimization step in the -PDE method, the values of coefficients are very close to that of the ground truth PDE. It should be noted that the efficiency of the -PDE method in learning correct PDEs holds beyond the noise levels investigated in this study. Figures 9 (a) to (c) compare the solutions of learned PDEs with the measured noisy data. It can be observed that even in cases with significant noise, the -PDE method never overfits the noise components in the measured data with redundant high-order nonlinear terms. Instead, this method always yields a governing equation that captures the most intrinsic invariants underlying the data.

noise level | identified PDE |
---|---|

0% | |

10% | |

20% | |

50% |

The second example of learning PDE from a 1D dynamical system examines the effectiveness of the -method in correctly identifying governing equations containing higher-order spatial derivatives. This section considers a mathematical model of traveling waves on shallow water surfaces, i.e., the KdV equation with the form . The KdV equation can be used to characterize the evolution of many long 1D waves such as the ion acoustic waves in a plasma and acoustic waves on a crystal lattice raissi2019physics. This study investigates the system described by the following KdV equation: with the initial condition and periodic boundary conditions.

noise level | identified PDE |
---|---|

0% | |

10% | |

20% | |

50% |

Table 3 summarizes the results of learning PDEs from the simulated traveling waves containing 0% to 50% noise. It shows that the correct PDE form with accurate coefficients can be identified using the -PDE method in all cases. When implementing the -PDE method, it was noted that the form of the KdV equation is easier to identify than that of the Burgers equation, because the -algorithm for sparse regression does not yield more than one candidate PDE even for very noisy cases. The details can be found by running the code for this example available on the website: https://github.com/ymlasu. Figures 10 and 11 show the measured data and the its snapshot with the solution to the learned PDE at sec. It can be found that even in case with 50% noise, the -PDE method can discover the intrinsic physics underlying the noisy data. Therefore, this example not only highlights the capability of the -method of learning higher-order spatial derivatives but also further proves its the robustness in extracting the underlying physics from noisy measurements.

This section investigates the effectiveness of the -PDE method in learning efficient governing equation(s) of 2D dynamical systems. First, the -method is used to discover the physics of a 2D dissipative system characterized by the Burgers equation with the initial condition and periodic boundary conditions. Figure 12 shows the snapshots of simulated data for this system. It should be noted that with the increase of dimensionality, the knowledge discovery of dynamical systems becomes more complex. Hence, in the sparse regression scheme of PDE learning, the library matrix should no longer be built in an exhaustive manner considering all possible combinations of polynomials to a certain power and spatial derivatives to a certain order, which will make the sparse regression problem intractable. Instead, is built with representative terms in multi-dimensional nonlinear dynamical systems (e.g., convective derivative , advective acceleration , and the Laplacian ) and their products with polynomials. Table 4 summarizes the results of learning PDEs from the 2D dissipative system using simulated clean and noisy data. It shows that the correct equation form with accurate coefficients can be successfully identified using the -PDE method with data containing as much as 40% random noise. When the noise level increases to 50%, the -algorithm fails to extract the most contributive terms from the library due to the severe influence of noise to the accuracy of calculated numerical derivatives. Advanced techniques of numerical differentiation such as automatic differentiation using graphical neural networks (GNN) baydin2017automatic will be investigated in future studies.

noise level | identified PDE |
---|---|

0% | |

10% | |

20% | |

30% | |

40% | |

50% |

Figure 13 compares the solution to the learned PDE with the measured data containing 40% noise and the solution to the ground truth PDE. One can observe that the proposed -PDE method can discover the underlying truth behind the noisy measurements from the 2D dissipative system.

Another 2D system investigated in this study is the lid-driven cavity flow which is a benchmark problem for viscous incompressible fluid flow Zienkiewicz2005finite. This study uses a geometry of a square cavity that is comprised of a lid on the top moving with a tangential unit velocity and three no-slip rigid walls. The velocity and pressure distributions are numerically simulated for a Reynolds number of 100. Figure 14 (a) visualizes the solultion to this sytem at sec, containing the velocity (small arrows) and pressure (color map with contour lines) distributions as well as the streamlines.

Table 5 lists the results of PDE learning using the -PDE method with simulated data containing various levels of noise. The correct PDE form can be learned for cases with as much as 50% noise. Figure 14 (b) illustrates the solution to the learned PDE in case with 20% noise, which is almost identical to the solution to the ground truth PDE (Figure 14 (a)). This observation further validates the effectiveness of the -PDE method in distilling the underlying intrinsic physics. When the noise level increases to 50%, the coefficient of the advective acceleration term (i.e.,() is about 14% different from the real value, as shown in the bottom row of Table 5. As a result, its solution especially the pressure distribution is different from that of the ground truth PDE, as shown in Figure 14 (c). This difference demonstrates the challenge of PDE learning for a multidimensional system especially in the presence of large noise, which will be further investigated in the future work.

noise level | identified PDE |
---|---|

0% | |

10% | |

20% | |

50% |

In this study, a robust data-driven method (i.e., the -PDE method) is proposed for discovering the underlying physics of a given system from measured data. Investigating and improving its robustness is critical for effectively distilling the intrinsic law underlying a complex novel dynamical system. The -PDE method has been tested on various systems in sufficiently challenging scenarios regarding the model complexity and noise intensity, and the identification results approve its effectiveness and generality. Compared with the state-of-the-art methods in the literature, the -PDE method is advantageous in its robustness since it requires the least effort in hyperparameter tuning which should be obviated in identifying the governing equation(s) of an unknown system.

This work was inspired by the pioneering work presented in schmidt2009distilling which discourages automatic discovery of natural laws. After examining the challenge of automatic identification via sparse regression as presented in rudy2017data and its following works (such as berg2019data and chen2020deep), this study borrows the “nonautomatic” idea in schmidt2009distilling and yields more than one candidate solutions through sparse regression with the -algorithm; finally, the representativeness of all candidate models are evaluated through solving and optimizing their respective PDE forms taking the measured data as the ground truth and objective. It has been demonstrated that the -PDE method always yields equations that capture the most intrinsic physics of the observed system. In comparison, most existing methods yield a unique PDE for a given system and do not allow evaluating its effectiveness in characterizing the system or optimizing its representativeness.

While it increases the robustness of PDE learning, the PDE solving/optimization step in the -PDE method considerably increases the computational cost especially for a high-dimensional system. However, this challenge can be solved by parallel computing and/or surrogate modeling, which will be investigated in future studies for more complex systems. The future work may also include applying the -PDE method real operating dynamical systems to further examine or improve its effectiveness in knowledge discovery.

The research reported in this paper was supported by funds from NASA University Leadership Initiative Program (Contract No. NNX17AJ86A, Project Officer: Dr. Anupa Bajwa, Principal Investigator: Dr. Yongming Liu). The support is gratefully acknowledged.

Comments

There are no comments yet.