Deep learning of physical laws from scarce data

by   Zhao Chen, et al.
Northeastern University

Harnessing data to discover the underlying governing laws or equations that describe the behavior of complex physical systems can significantly advance our modeling, simulation and understanding of such systems in various science and engineering disciplines. Recent advances in sparse identification show encouraging success in distilling closed-form governing equations from data for a wide range of nonlinear dynamical systems. However, the fundamental bottleneck of this approach lies in the robustness and scalability with respect to data scarcity and noise. This work introduces a novel physics-informed deep learning framework to discover governing partial differential equations (PDEs) from scarce and noisy data for nonlinear spatiotemporal systems. In particular, this approach seamlessly integrates the strengths of deep neural networks for rich representation learning, automatic differentiation and sparse regression to approximate the solution of system variables, compute essential derivatives, as well as identify the key derivative terms and parameters that form the structure and explicit expression of the PDEs. The efficacy and robustness of this method are demonstrated on discovering a variety of PDE systems with different levels of data scarcity and noise. The resulting computational framework shows the potential for closed-form model discovery in practical applications where large and accurate datasets are intractable to capture.



There are no comments yet.


page 1

page 7

page 25

page 28


DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data

In recent years, data-driven methods have been utilized to learn dynamic...

APIK: Active Physics-Informed Kriging Model with Partial Differential Equations

Kriging (or Gaussian process regression) is a popular machine learning m...

DySMHO: Data-Driven Discovery of Governing Equations for Dynamical Systems via Moving Horizon Optimization

Discovering the governing laws underpinning physical and chemical phenom...

Deep-Learning Discovers Macroscopic Governing Equations for Viscous Gravity Currents from Microscopic Simulation Data

Although deep-learning has been successfully applied in a variety of sci...

Model discovery in the sparse sampling regime

To improve the physical understanding and the predictions of complex dyn...

Discovering Sparse Interpretable Dynamics from Partial Observations

Identifying the governing equations of a nonlinear dynamical system is k...

Extracting Interpretable Physical Parameters from Spatiotemporal Systems using Unsupervised Learning

Experimental data is often affected by uncontrolled variables that make ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

PiDL with Sparse Regression for PDE Discovery

We consider a multi-dimensional spatiotemporal system whose governing equations can be described by a set of nonlinear, coupled, parameterized PDEs in the general form given by


where is the multi-dimensional latent solution (dimension ) while is the first-order time derivative term; denotes time and specifies the space; is a complex nonlinear functional of and its spatial derivatives, parameterized by ; is the gradient operator with respect to ; is the source term (note that, in many common cases, represents no source input to the system). The PDEs are also subjected to initial and boundary conditions (I/BCs), if known, denoted by and . For systems that obey the Newton’s second law of motion (e.g., the acceleration term is present), the governing PDEs can be written in a state-space form similar to Eq. [1]. Our objective is to find the closed form of from available spatiotemporal measurements which are assumed to be incomplete, scarce and noisy commonly seen in real-world applications (e.g., when data capture is very costly or the data itself is sparse in nature). We assume that the physical law is governed by only a few important terms which can be selected from a large-space library of candidate functions, where sparse regression can be applied (Brunton3932, Rudye1602614, Schaeffer2017). Inherent in this assumption leads to reformulation of Eq. [1] in the following (assuming zero source for simplicity):


Here, is an extensive library of symbolic functions consisting of many candidate terms, e.g., constant, polynomial, and trigonometric terms with respect to each spatial dimension (Rudye1602614, Schaeffer2017)

, assembled in a row vector given by


where represents the element-wise Hadamard product; denotes the total number of candidate terms in the library; the subscripts in the context of depict the derivatives; is the sparse coefficient matrix (only the active candidate terms in have non-zero values), e.g., for . If there is unknown source input, the candidate functions for can also be incorporated into for discovery (see SI Appendix, Section C3). Thus, the discovery problem can then be stated as: given the spatiotemporal measurement data , find sparse such that Eq. [2] holds.

We present an interpretable PiDL paradigm with sparse regression to simultaneously model the system response and identify the parsimonious closed form of the governing PDEs. The innovative mathematical architecture of this method is shown in Fig. 1. To begin with, we interpret the latent solution by a deep dense neural network (denoted by ), namely, , where

represents the DNN trainable parameters including weights and biases. The DNN essentially plays a role as a nonlinear functional to approximate the latent solution with the data loss function expressed as:


where is the measurement data, is the total number of data points, and denotes the Frobenius norm. With graph-based automatic differentiation where derivatives on are evaluated at machine precision, the library of candidate functions can be computed from . Thus, the sparse representation of the reconstructed PDEs can be written in a residual form, namely, , where denotes the PDE residuals. The basic concept is to adapt both the DNN trainable parameters and the PDE coefficients such that the neural network can fit the measurement data while satisfying the constraints defined by the underlying PDEs. The PDE residuals will be evaluated on a large number of collocation points , randomly sampled in the spatiotemporal space, leading to the residual physics loss function given by


where and denote respectively the discretization of the first-order time derivative term and the library of candidate functions evaluated on the collocation points; is the total number of spatiotemporal collocation points. Note that the I/BCs, if measured or known, can also be added as part of the physics loss in Eq. [5].

Figure 1: Schematic architecture of the framework of PiDL with sparse regression for data-driven discovery of PDEs. The network consists of two components: a DNN governed by the trainable parameters , which maps the spatiotemporal coordinates to the latent solution , and the physical law described by a set of nonlinear PDEs, which are formed by the derivative candidate functions parameterized by the unknown sparse coefficients . The total loss function is composed of the data loss , the physics loss , and the regularization term that promotes the sparsity. Here, and denote the relative weighting of the loss functions, while and represent the measurement data and collocation samples respectively. Note that the physics loss, in a residual form, is only evaluated on the spatiotemporal collocation samples. The colored dots in the sparse coefficients matrix (or vector) on the right denote non-zero values. Simultaneous optimization of the unknown parameters leads to both the trained DNN for inference of the data-driven full-field solution and the discovered parsimonious closed-form PDEs.

The total loss function for training the PiDL network is thus composed of the data loss , the residual physics loss and a regularization term, expressed as:


where is the relative weighting of the residual physics loss function; is the regularization parameter; represents the norm. Optimizing the total loss function can produce a DNN that can not only predict the data-driven full-field system response, but also uncover the parsimonious closed-form PDEs.

Noteworthy, the total loss function has an implicit complex form, and thus, directly solving the optimization problem is highly intractable since the regularization makes this problem -hard. Though relaxation of the term by the less rigorous regularization improves the well-posedness and enables the optimization in a continuous space, false positive identification occurs (BERG2019239, Both2019). To address this challenge, we present an alternating direction optimization (ADO) algorithm that divides the overall optimization problem into a set of tractable subproblems to sequentially optimize and within a few alternating iterations (denoted by ), namely,


The fundamental concept of the ADO algorithm shares similarity with the alternating direction methods of multipliers (boyd2011). In each alternating iteration , the sparse PDE coefficients in Eq. [7a] are updated (denoted by ) via STRidge [a sequential thresholding regression process that serves as a proxy for regularization (Brunton3932, Rudye1602614)], based on the DNN parameters from the previous iteration (e.g., ). The DNN parameters in the current iteration are then updated (denoted by ) through a standard neural network training algorithm [in particular, the combined Adam (adam2014) + L-BFGS (Byrd1995) optimizer], taking as known. The alternations between the sub-optimal solutions will lead to a high-quality optimization solution. The algorithm design of ADO as well as the implementation details and specifications are given in SI Appendix, Algorithm 1 and Algorithm 2.

PDE name Error (noise 0%) Error (noise 1%) Error (noise 10%) Description of data discretization
Burgers 0.010.01% 0.190.11% 1.151.20% , , subsample 1.95%
KS 0.070.01% 0.610.04% 0.710.06% , , subsample 12.3%
Schrödinger 0.090.04% 0.650.29% 2.310.28% , , subsample 37.5%
RD 0.070.08% 0.250.30% 4.783.66% , , subsample 0.29%
NS 0.660.72% 0.860.63% 1.401.83% , , , subsample 0.22%

The error is defined as the average relative error of the identified non-zero coefficients w.r.t. the ground truth. The subscript denotes the number of discretization. Our method is also compared with SINDy [the PDE-FIND approach presented in (Rudye1602614)] as illustrated in SI Appendix, Table S1

. It is noted that much less measurement data polluted with a higher level of noise are used in our discovery. Gaussian white noise is added to the synthetic response with the noise level defined as the root-mean-square ratio between the noise and exact solution.

Table 1: Summary of the PiDL discovery results in the context of accuracy for a wide range of canonical models.
Figure 2: Discovered Burgers equation for data with 10% noise. (A) Evolution of the sparse coefficients for 16 candidate functions used to form the PDE, where the color represents the coefficient value. (B) The predicted response in comparison with the exact solution with the prediction error.


We demonstrate the efficacy and robustness of our methodology on a group of canonical PDEs used to represent a wide range of physical systems with nonlinear, periodic and/or chaotic behaviors. In particular, we discover the closed forms of Burgers, Kuramoto-Sivashinsky (KS), nonlinear Schrödinger, Reaction-Diffusion (RD), and Navier-Stokes (NS) equations from scarce and noisy time-series measurements recorded by a number of sensors at fixed locations (data are polluted with Gaussian white noise). Pre-training of PiDL is conducted before running the ADO algorithm for discovery, by simply replacing in Eq. [6] with where brute-force gradient-based optimization for both and becomes applicable. The -regularized pre-training can accelerate the convergence of ADO by providing an admissible “initial guess”. Results are presented in Table 1 which shows quite accurate discovery and demonstrates satisfactory performance of the proposed method and its robustness to measurement data scarcity and noise. Full details of each example are presented as follows. We also compare our method with SINDy considering different levels of data scarcity and noise (SI Appendix, Section B6).

Figure 3: Discovered the KS equation for data with 10% noise. (A) Evolution of the sparse coefficients for 36 candidate functions used to reconstruct the PDE. (B) The predicted response compared with the exact solution.

Burgers Equation

We first consider a dissipative system with the dynamics governed by a 1D viscous Burgers equation expressed as , where (equal to 0.1) denotes the diffusion coefficient. The equation describes the decaying stationary viscous shock of a system after a finite period of time, commonly found in simplified fluid mechanics, nonlinear acoustics and gas dynamics. We test the PiDL approach on the recorded traveling shock waves from the solution to Burgers equation subjected to a Gaussian initial condition. In particular, 5 sensors are randomly placed at fixed locations among the 256 spatial grids and record the wave for 101 time steps, leading to 1.95% of the dataset used in (Rudye1602614). A full description of the dataset, design of the library of candidate functions (16 terms) and model training is given in SI Appendix, Section B1. Fig. 2 shows the discovered Burgers equation for a dataset with 10% noise. The evolution of the coefficients illustrates robust convergence to the ground truth (error about 1.2%), resulting in accurate discovery. The trained PiDL properly reproduces the dynamical response from noisy measurements (e.g., the full-field prediction error is 2.02%). The ADO algorithm converges only after the first alternating iteration and shows capacity to recover the correct sparsity pattern of the PDE. Simultaneous discovery of the Burgers equation and the unknown source is further discussed in SI Appendix, Section C3.

Kuramoto-Sivashinsky (KS) Equation

Another dissipative system with intrinsic instabilities is considered, governed by the 1D Kuramoto-Sivashinsky (KS) equation , where the reverse diffusion term leads to the blowup behavior while the fourth-order derivative introduces chaotic patterns as shown in Fig. 3B, making an ideal test problem for equation discovery. The KS equation is widely used to model the instabilities in laminar flame fronts and dissipative trapped-ion modes among others. We randomly choose 320 points as fixed sensors and record the wave response for 101 time steps, resulting in 12.3% of the dataset used in (Rudye1602614). A total of 36 candidate functions are employed to construct the underlying PDE. Detail description of this example is found in SI Appendix, Section B2. It is notable that the chaotic behavior poses significant challenges in approximating the full-field spatiotemporal derivatives, especially the high-order , from poorly measured data for discovery of such a PDE. Existing methods [e.g., the family of SINDy methods (Rudye1602614, Schaeffer2017)] eventually fail in this case given very coarse and noisy measurements. Nevertheless, PiDL successfully distils the closed form of the KS equation from subsampled sparse data with 10% noise, shown in Fig. 3. The evolution of the coefficients in Fig. 3A illustrates that both the candidate terms and the corresponding coefficients are correctly identified (close to the original parameters; error around 0.7%) within a few ADO iterations. The predicted full-field wave by the trained PiDL also coincides with with the exact solution with a relative error of 1.87% (Fig. 3B).

Figure 4: Discovered nonlinear Schrödinger equation for a dataset with 10% noise. (A) Evolution of the sparse coefficients for the candidate functions used to reconstruct the PDE. (B and C) The predicted real-part (B) and imaginary-part (C) responses compared with the exact solution.

Nonlinear Schrödinger Equation

In the third example, we discover the nonlinear Schrödinger equation, , where is a complex field variable. This well-known equation is widely used in modeling the propagation of light in nonlinear optical fibers, Bose-Einstein condensates, Langmuir waves in hot plasmas, and so on. We take 37.5% subsamples (e.g., randomly selected from the spatial grids) of the dataset as shown in Table 1 to construct the PDE using 40 candidate functions . Since the function is complex valued, we model separately the real part () and the imaginary part () of the solution in the output of the DNN, assemble them to obtain the complex solution , and construct the complex valued candidate functions for PDE discovery. To avoid complex gradients in optimization, we use the modulus , instead of the norm shown in Eq. [5], for the residual physics loss (see SI Appendix, Section B3 for more details). Fig. 4 shows the discovered Schrödinger equation for the case of 10% noise. The evolution history of the sparse coefficients clearly shows the convergence to the actual values (Fig. 4A; error about 4.14%) resulting in accurate closed-form identification of the PDE, while the reconstructed full-field response, for both real and imaginary parts, matches well the exact solution with a slight relative error of 1% (Fig. 4B and C).

Figure 5: Discovered RD equations for a dataset with 10% noise. (A and B) Evolution of the sparse coefficients (A) and (B) for 110 candidate functions used to reconstruct the -equation and the -equation, respectively. (C and D) The predicted response snapshots (C) and (D) at compared with the exact solution.

Reaction-Diffusion (RD) Equations

The examples discussed previously are low-dimensional (1D) models with limited complexity. We herein consider a - reaction-diffusion (RD) system in a 2D domain with the pattern forming behavior governed by two coupled PDEs: and , where and are the two field variables, , , and

. The RD equations exhibit a wide range of behaviors including wave-like phenomena and self-organized patterns found in chemical and biological systems. The particular RD equations considered here display spiral waves subjected to periodic boundary conditions. Full details on the dataset, selection of candidate functions and hyperparameter setup of the PiDL model are given in

SI Appendix, Section B4. Fig. 5A shows the evolution of the sparse coefficients for 110 candidate functions , given a dataset with 10% noise. Both the sparse terms and the associated coefficients are precisely identified (as depicted in Fig. 5A and 5B, and the closed-form equations in Fig. 5

). Due to the complexity of the PDEs and the high dimension, slightly more epochs are required in ADO to retain reliable convergence. The predicted response snapshots (e.g., at

) by the trained PiDL in Fig. 5B are close to the ground truth. This example shows especially the great ability and robustness of our method for discovering governing PDEs for high-dimensional systems from highly noisy data.

Navier-Stokes (NS) Equation

As a final example, we consider a 2D fluid flow passing a circular cylinder with the local rotation dynamics governed by the well-known Navier-Stokes vorticity equation , where is the spatiotemporally variant vorticity, denotes the fluid velocities, and is the kinematic viscosity ( at Reynolds number 100). We leverage the open simulation data (Rudye1602614) and subsample a dataset of the flow response at 500 spatial locations randomly picked within the indicated region in SI Appendix, Fig. S1, which record time series for 60 time steps. The resulting dataset is only 10% of that used in (Rudye1602614). A comprehensive discussion of this example is found in SI Appendix, Section B5. Fig. 6 summarizes the result of the discovered NS equation for a dataset with 10% noise. It is encouraging that the uncovered PDE expression is almost identical to the ground truth, for both the derivative terms and their coefficients, even under 10% noise corruption. The coefficients , corresponding to 60 candidate functions , converge very quickly to the correct values with precise sparsity right after the first ADO iteration (Fig. 6A). The vorticity patterns and magnitudes are also well predicted as indicated by the snapshot (at ) shown in Fig. 6B (the full-field error about 2.57%). This example provides a compelling test case for the proposed PiDL approach which is capable of discovering the closed-form NS equation with scarce and noisy data.

Figure 6: Discovered NS equation for data with 10% noise. (A) Evolution of the sparse coefficients for 60 candidate functions used to form the PDE. (B) Predicted vorticity snapshot at compared with the exact solution.


In summary, we have presented a novel interpretable deep learning method for discovering physical laws, in particular parsimonious closed-form PDE(s), from scarce and noisy data (commonly seen in scientific investigations and real-world applications) for multi-dimensional nonlinear spatiotemporal systems. This approach combines the strengths of DNNs for rich representation learning of nonlinear functions, automatic differentiation for accurate derivative calculation as well as sparse regression to tackle the fundamental limitation faced by existing sparsity-promoting methods that scale poorly with respect to data noise and scarcity. An alternating direction optimization strategy is proposed to simultaneously train the DNN and determine the optimal sparse coefficients of selected candidate terms for reconstructing the PDE(s). The synergy of interpretable DNN and sparse PDE representation results in the following outcome: the DNN provides accurate modeling of the solution and its derivatives as a basis for constructing the governing equation(s), while the sparsely represented PDE(s) in turn informs and constraints the DNN which makes it generalizable and further enhances the discovery. The overall approach is rooted in a comprehensive integration of bottom-up (data-driven) and top-down (physics-informed) processes for scientific discovery, with fusion of physics-informed deep learning, sparse regression and optimization. We demonstrate this method on a number of dynamical systems exhibiting nonlinear spatiotemporal behaviors governed by multi-dimensional canonical PDEs. Results highlight that the approach is capable of accurately discovering the exact form of the governing equation(s), even in a data-rich information-poor space where the high-dimensional measurements are scarce and noisy.

There still remain some potential limitations associated with the present PiDL framework for physical law discovery. For example, although the fully connected DNN used in this work has advantage of analytical approximation of the PDE derivatives via automatic differentiation, directly applying it to model the solution of higher dimensional systems (e.g., long-short term response evolution in a 3D domain) results in computational bottleneck and optimization challenges. Advances in discrete DNNs with spatiotemporal discretization [e.g., the convolutional long-short term memory network (ConvLSTM)

(Shi2015) or similar] have the potential to help resolve this challenge, which will be demonstrated in our future work. Several other aspects, such as the design of library of candidate functions, the role of collocation points, and discovery with unknown source terms, are further discussed in SI Appendix, Section C1–C4.

We acknowledge the partial support by the Engineering for Civil Infrastructure program at National Science Foundation under grant CMMI-2013067. All the used data and codes are publicly available on GitHub at