PiDL with Sparse Regression for PDE Discovery
We consider a multi-dimensional spatiotemporal system whose governing equations can be described by a set of nonlinear, coupled, parameterized PDEs in the general form given by
where is the multi-dimensional latent solution (dimension ) while is the first-order time derivative term; denotes time and specifies the space; is a complex nonlinear functional of and its spatial derivatives, parameterized by ; is the gradient operator with respect to ; is the source term (note that, in many common cases, represents no source input to the system). The PDEs are also subjected to initial and boundary conditions (I/BCs), if known, denoted by and . For systems that obey the Newton’s second law of motion (e.g., the acceleration term is present), the governing PDEs can be written in a state-space form similar to Eq. . Our objective is to find the closed form of from available spatiotemporal measurements which are assumed to be incomplete, scarce and noisy commonly seen in real-world applications (e.g., when data capture is very costly or the data itself is sparse in nature). We assume that the physical law is governed by only a few important terms which can be selected from a large-space library of candidate functions, where sparse regression can be applied (Brunton3932, Rudye1602614, Schaeffer2017). Inherent in this assumption leads to reformulation of Eq.  in the following (assuming zero source for simplicity):
Here, is an extensive library of symbolic functions consisting of many candidate terms, e.g., constant, polynomial, and trigonometric terms with respect to each spatial dimension (Rudye1602614, Schaeffer2017)
, assembled in a row vector given by
where represents the element-wise Hadamard product; denotes the total number of candidate terms in the library; the subscripts in the context of depict the derivatives; is the sparse coefficient matrix (only the active candidate terms in have non-zero values), e.g., for . If there is unknown source input, the candidate functions for can also be incorporated into for discovery (see SI Appendix, Section C3). Thus, the discovery problem can then be stated as: given the spatiotemporal measurement data , find sparse such that Eq.  holds.
We present an interpretable PiDL paradigm with sparse regression to simultaneously model the system response and identify the parsimonious closed form of the governing PDEs. The innovative mathematical architecture of this method is shown in Fig. 1. To begin with, we interpret the latent solution by a deep dense neural network (denoted by ), namely, , where
represents the DNN trainable parameters including weights and biases. The DNN essentially plays a role as a nonlinear functional to approximate the latent solution with the data loss function expressed as:
where is the measurement data, is the total number of data points, and denotes the Frobenius norm. With graph-based automatic differentiation where derivatives on are evaluated at machine precision, the library of candidate functions can be computed from . Thus, the sparse representation of the reconstructed PDEs can be written in a residual form, namely, , where denotes the PDE residuals. The basic concept is to adapt both the DNN trainable parameters and the PDE coefficients such that the neural network can fit the measurement data while satisfying the constraints defined by the underlying PDEs. The PDE residuals will be evaluated on a large number of collocation points , randomly sampled in the spatiotemporal space, leading to the residual physics loss function given by
where and denote respectively the discretization of the first-order time derivative term and the library of candidate functions evaluated on the collocation points; is the total number of spatiotemporal collocation points. Note that the I/BCs, if measured or known, can also be added as part of the physics loss in Eq. .
The total loss function for training the PiDL network is thus composed of the data loss , the residual physics loss and a regularization term, expressed as:
where is the relative weighting of the residual physics loss function; is the regularization parameter; represents the norm. Optimizing the total loss function can produce a DNN that can not only predict the data-driven full-field system response, but also uncover the parsimonious closed-form PDEs.
Noteworthy, the total loss function has an implicit complex form, and thus, directly solving the optimization problem is highly intractable since the regularization makes this problem -hard. Though relaxation of the term by the less rigorous regularization improves the well-posedness and enables the optimization in a continuous space, false positive identification occurs (BERG2019239, Both2019). To address this challenge, we present an alternating direction optimization (ADO) algorithm that divides the overall optimization problem into a set of tractable subproblems to sequentially optimize and within a few alternating iterations (denoted by ), namely,
The fundamental concept of the ADO algorithm shares similarity with the alternating direction methods of multipliers (boyd2011). In each alternating iteration , the sparse PDE coefficients in Eq. [7a] are updated (denoted by ) via STRidge [a sequential thresholding regression process that serves as a proxy for regularization (Brunton3932, Rudye1602614)], based on the DNN parameters from the previous iteration (e.g., ). The DNN parameters in the current iteration are then updated (denoted by ) through a standard neural network training algorithm [in particular, the combined Adam (adam2014) + L-BFGS (Byrd1995) optimizer], taking as known. The alternations between the sub-optimal solutions will lead to a high-quality optimization solution. The algorithm design of ADO as well as the implementation details and specifications are given in SI Appendix, Algorithm 1 and Algorithm 2.
|PDE name||Error (noise 0%)||Error (noise 1%)||Error (noise 10%)||Description of data discretization|
|Burgers||0.010.01%||0.190.11%||1.151.20%||, , subsample 1.95%|
|KS||0.070.01%||0.610.04%||0.710.06%||, , subsample 12.3%|
|Schrödinger||0.090.04%||0.650.29%||2.310.28%||, , subsample 37.5%|
|RD||0.070.08%||0.250.30%||4.783.66%||, , subsample 0.29%|
|NS||0.660.72%||0.860.63%||1.401.83%||, , , subsample 0.22%|
The error is defined as the average relative error of the identified non-zero coefficients w.r.t. the ground truth. The subscript denotes the number of discretization. Our method is also compared with SINDy [the PDE-FIND approach presented in (Rudye1602614)] as illustrated in SI Appendix, Table S1 . It is noted that much less measurement data polluted with a higher level of noise are used in our discovery. Gaussian white noise is added to the synthetic response with the noise level defined as the root-mean-square ratio between the noise and exact solution.
. It is noted that much less measurement data polluted with a higher level of noise are used in our discovery. Gaussian white noise is added to the synthetic response with the noise level defined as the root-mean-square ratio between the noise and exact solution.
We demonstrate the efficacy and robustness of our methodology on a group of canonical PDEs used to represent a wide range of physical systems with nonlinear, periodic and/or chaotic behaviors. In particular, we discover the closed forms of Burgers, Kuramoto-Sivashinsky (KS), nonlinear Schrödinger, Reaction-Diffusion (RD), and Navier-Stokes (NS) equations from scarce and noisy time-series measurements recorded by a number of sensors at fixed locations (data are polluted with Gaussian white noise). Pre-training of PiDL is conducted before running the ADO algorithm for discovery, by simply replacing in Eq.  with where brute-force gradient-based optimization for both and becomes applicable. The -regularized pre-training can accelerate the convergence of ADO by providing an admissible “initial guess”. Results are presented in Table 1 which shows quite accurate discovery and demonstrates satisfactory performance of the proposed method and its robustness to measurement data scarcity and noise. Full details of each example are presented as follows. We also compare our method with SINDy considering different levels of data scarcity and noise (SI Appendix, Section B6).
We first consider a dissipative system with the dynamics governed by a 1D viscous Burgers equation expressed as , where (equal to 0.1) denotes the diffusion coefficient. The equation describes the decaying stationary viscous shock of a system after a finite period of time, commonly found in simplified fluid mechanics, nonlinear acoustics and gas dynamics. We test the PiDL approach on the recorded traveling shock waves from the solution to Burgers equation subjected to a Gaussian initial condition. In particular, 5 sensors are randomly placed at fixed locations among the 256 spatial grids and record the wave for 101 time steps, leading to 1.95% of the dataset used in (Rudye1602614). A full description of the dataset, design of the library of candidate functions (16 terms) and model training is given in SI Appendix, Section B1. Fig. 2 shows the discovered Burgers equation for a dataset with 10% noise. The evolution of the coefficients illustrates robust convergence to the ground truth (error about 1.2%), resulting in accurate discovery. The trained PiDL properly reproduces the dynamical response from noisy measurements (e.g., the full-field prediction error is 2.02%). The ADO algorithm converges only after the first alternating iteration and shows capacity to recover the correct sparsity pattern of the PDE. Simultaneous discovery of the Burgers equation and the unknown source is further discussed in SI Appendix, Section C3.
Kuramoto-Sivashinsky (KS) Equation
Another dissipative system with intrinsic instabilities is considered, governed by the 1D Kuramoto-Sivashinsky (KS) equation , where the reverse diffusion term leads to the blowup behavior while the fourth-order derivative introduces chaotic patterns as shown in Fig. 3B, making an ideal test problem for equation discovery. The KS equation is widely used to model the instabilities in laminar flame fronts and dissipative trapped-ion modes among others. We randomly choose 320 points as fixed sensors and record the wave response for 101 time steps, resulting in 12.3% of the dataset used in (Rudye1602614). A total of 36 candidate functions are employed to construct the underlying PDE. Detail description of this example is found in SI Appendix, Section B2. It is notable that the chaotic behavior poses significant challenges in approximating the full-field spatiotemporal derivatives, especially the high-order , from poorly measured data for discovery of such a PDE. Existing methods [e.g., the family of SINDy methods (Rudye1602614, Schaeffer2017)] eventually fail in this case given very coarse and noisy measurements. Nevertheless, PiDL successfully distils the closed form of the KS equation from subsampled sparse data with 10% noise, shown in Fig. 3. The evolution of the coefficients in Fig. 3A illustrates that both the candidate terms and the corresponding coefficients are correctly identified (close to the original parameters; error around 0.7%) within a few ADO iterations. The predicted full-field wave by the trained PiDL also coincides with with the exact solution with a relative error of 1.87% (Fig. 3B).
Nonlinear Schrödinger Equation
In the third example, we discover the nonlinear Schrödinger equation, , where is a complex field variable. This well-known equation is widely used in modeling the propagation of light in nonlinear optical fibers, Bose-Einstein condensates, Langmuir waves in hot plasmas, and so on. We take 37.5% subsamples (e.g., randomly selected from the spatial grids) of the dataset as shown in Table 1 to construct the PDE using 40 candidate functions . Since the function is complex valued, we model separately the real part () and the imaginary part () of the solution in the output of the DNN, assemble them to obtain the complex solution , and construct the complex valued candidate functions for PDE discovery. To avoid complex gradients in optimization, we use the modulus , instead of the norm shown in Eq. , for the residual physics loss (see SI Appendix, Section B3 for more details). Fig. 4 shows the discovered Schrödinger equation for the case of 10% noise. The evolution history of the sparse coefficients clearly shows the convergence to the actual values (Fig. 4A; error about 4.14%) resulting in accurate closed-form identification of the PDE, while the reconstructed full-field response, for both real and imaginary parts, matches well the exact solution with a slight relative error of 1% (Fig. 4B and C).
Reaction-Diffusion (RD) Equations
The examples discussed previously are low-dimensional (1D) models with limited complexity. We herein consider a - reaction-diffusion (RD) system in a 2D domain with the pattern forming behavior governed by two coupled PDEs: and , where and are the two field variables, , , and
. The RD equations exhibit a wide range of behaviors including wave-like phenomena and self-organized patterns found in chemical and biological systems. The particular RD equations considered here display spiral waves subjected to periodic boundary conditions. Full details on the dataset, selection of candidate functions and hyperparameter setup of the PiDL model are given inSI Appendix, Section B4. Fig. 5A shows the evolution of the sparse coefficients for 110 candidate functions , given a dataset with 10% noise. Both the sparse terms and the associated coefficients are precisely identified (as depicted in Fig. 5A and 5B, and the closed-form equations in Fig. 5
). Due to the complexity of the PDEs and the high dimension, slightly more epochs are required in ADO to retain reliable convergence. The predicted response snapshots (e.g., at) by the trained PiDL in Fig. 5B are close to the ground truth. This example shows especially the great ability and robustness of our method for discovering governing PDEs for high-dimensional systems from highly noisy data.
Navier-Stokes (NS) Equation
As a final example, we consider a 2D fluid flow passing a circular cylinder with the local rotation dynamics governed by the well-known Navier-Stokes vorticity equation , where is the spatiotemporally variant vorticity, denotes the fluid velocities, and is the kinematic viscosity ( at Reynolds number 100). We leverage the open simulation data (Rudye1602614) and subsample a dataset of the flow response at 500 spatial locations randomly picked within the indicated region in SI Appendix, Fig. S1, which record time series for 60 time steps. The resulting dataset is only 10% of that used in (Rudye1602614). A comprehensive discussion of this example is found in SI Appendix, Section B5. Fig. 6 summarizes the result of the discovered NS equation for a dataset with 10% noise. It is encouraging that the uncovered PDE expression is almost identical to the ground truth, for both the derivative terms and their coefficients, even under 10% noise corruption. The coefficients , corresponding to 60 candidate functions , converge very quickly to the correct values with precise sparsity right after the first ADO iteration (Fig. 6A). The vorticity patterns and magnitudes are also well predicted as indicated by the snapshot (at ) shown in Fig. 6B (the full-field error about 2.57%). This example provides a compelling test case for the proposed PiDL approach which is capable of discovering the closed-form NS equation with scarce and noisy data.
In summary, we have presented a novel interpretable deep learning method for discovering physical laws, in particular parsimonious closed-form PDE(s), from scarce and noisy data (commonly seen in scientific investigations and real-world applications) for multi-dimensional nonlinear spatiotemporal systems. This approach combines the strengths of DNNs for rich representation learning of nonlinear functions, automatic differentiation for accurate derivative calculation as well as sparse regression to tackle the fundamental limitation faced by existing sparsity-promoting methods that scale poorly with respect to data noise and scarcity. An alternating direction optimization strategy is proposed to simultaneously train the DNN and determine the optimal sparse coefficients of selected candidate terms for reconstructing the PDE(s). The synergy of interpretable DNN and sparse PDE representation results in the following outcome: the DNN provides accurate modeling of the solution and its derivatives as a basis for constructing the governing equation(s), while the sparsely represented PDE(s) in turn informs and constraints the DNN which makes it generalizable and further enhances the discovery. The overall approach is rooted in a comprehensive integration of bottom-up (data-driven) and top-down (physics-informed) processes for scientific discovery, with fusion of physics-informed deep learning, sparse regression and optimization. We demonstrate this method on a number of dynamical systems exhibiting nonlinear spatiotemporal behaviors governed by multi-dimensional canonical PDEs. Results highlight that the approach is capable of accurately discovering the exact form of the governing equation(s), even in a data-rich information-poor space where the high-dimensional measurements are scarce and noisy.
There still remain some potential limitations associated with the present PiDL framework for physical law discovery. For example, although the fully connected DNN used in this work has advantage of analytical approximation of the PDE derivatives via automatic differentiation, directly applying it to model the solution of higher dimensional systems (e.g., long-short term response evolution in a 3D domain) results in computational bottleneck and optimization challenges. Advances in discrete DNNs with spatiotemporal discretization [e.g., the convolutional long-short term memory network (ConvLSTM)(Shi2015) or similar] have the potential to help resolve this challenge, which will be demonstrated in our future work. Several other aspects, such as the design of library of candidate functions, the role of collocation points, and discovery with unknown source terms, are further discussed in SI Appendix, Section C1–C4.
We acknowledge the partial support by the Engineering for Civil Infrastructure program at National Science Foundation under grant CMMI-2013067. All the used data and codes are publicly available on GitHub at https://github.com/zhaochenneu/EQDiscovery.