A Deep Learning-based Collocation Method for Modeling Unknown PDEs from Sparse Observation

11/30/2020 ∙ by Priyabrata Saha, et al. ∙ Georgia Institute of Technology 0

Deep learning-based modeling of dynamical systems driven by partial differential equations (PDEs) has become quite popular in recent years. However, most of the existing deep learning-based methods either assume strong physics prior, or depend on specific initial and boundary conditions, or require data in dense regular grid making them inapt for modeling unknown PDEs from sparsely-observed data. This paper presents a deep learning-based collocation method for modeling dynamical systems driven by unknown PDEs when data sites are sparsely distributed. The proposed method is spatial dimension-independent, geometrically flexible, learns from sparsely-available data and the learned model does not depend on any specific initial and boundary conditions. We demonstrate our method in the forecasting task for two-dimensional wave equation and Burgers-Fisher equation in multiple geometries with different boundary conditions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Complex physical phenomena are often modeled with a set of differential equations (DEs). Traditionally, model equations are constructed based on physical laws and empirical observation. However, in many practical scenarios, mathematical modeling is very challenging and/or can only describe the system behavior partially. Furthermore, even if the involved differential equations are known, their analytical solutions are difficult to obtain for most practical problems and therefore, must be solved numerically [mai2001numerical]. Traditional accurate numerical solvers are computationally intensive. The success of deep learning in various sequence prediction tasks, for example natural language and speech modeling [sutskever2014sequence, vaswani2017attention, chung2014empirical], video prediction [xingjian2015convolutional, finn2016unsupervised, lee2017desire], has motivated many researchers to approach both the aforementioned problems using deep learning. Consequently, a number of deep learning models which incorporate knowledge from physics or applied mathematics, have been introduced for modeling complex dynamical systems involving partial differential equations (PDEs) [raissi2019physics, long2018pde, long2018hybridnet, de2019deep, saha2020phicnet, sirignano2018dgm, khoo2017solving, geneva2020modeling, sirignano2020dpm, bar2019learning].

Typically deep learning methods requires huge amount of data to obtain generalized solution. However, in many practical scenarios, data or observation can be obtained only in few sites (for example, sensor measurements). In this paper, we consider the problem of modeling unknown PDE-driven spatiotemporal dynamics from sparsely-observed data. Given a set of distinct scattered spatial locations and corresponding time series of measurements of a physical quantity at those locations , we want to learn a model for the unknown PDE that governs the data. We assume the underlying PDE belongs to a family of PDEs described by the following generic form:

(1)

denotes the order partial differential operator with respect to time and denotes all possible combination of order spatial (partial) differential operators. We assume that the temporal order () of the underlying PDE is known a priori while the form of the function , nonlinear in general, and its arguments are unknown. Though we observe the system at few discrete locations, we seek to predict the evolution of at any generic location or over the entire . We want the learned model to construct the solution of the underlying PDE in time marching fashion, given any initial and boundary conditions.

Our method integrates radial basis function (RBF) collocation method with deep learning for modeling unknown PDEs from sparsely-observed data. RBF collocation method is widely used as numerical solver for PDEs because of its inherent meshfree nature that provides geometric flexibility with scattered node layout [bollig2012solution]. Another advantage of RBF-based methods is that they can be used to solve high dimensional problem since they convert problems involving multiple space dimensions into virtually one-dimensional [fasshauer2007meshfree, wu2004meshless]

. We propose to learn the application of the differential operators on a chosen RBF using deep neural networks which makes the method independent of initial and boundary conditions while also utilizing the meshfree, geometrically flexible and spatial dimension-independent nature of RBF collocation method. By choosing infinitely-smooth RBFs (e.g. Gaussian, multiquadric etc.) and tuning their shape parameters, our method can be employed in scenarios where data sites are sparsely distributed. We first show how RBF collocation method for time-dependent linear PDEs can be integrated with deep neural network to incorporate unknown linear PDEs. Next, we extend the architecture to model unknown nonlinear PDEs of form (

1) and finally to model coupled system of multiple nonlinear PDEs. We evaluate our method in the forecasting task for two dimensional wave equation and Burgers-Fisher equation, and demonstrate the capability of transferring the learned model in different geometries with different boundary conditions.

Related work.

Deep learning methods for learning or solving PDE-driven system can be broadly categorized into two types. Fhe first kind of methods directly approximates the PDE solution with deep neural networks (DNNs) as a function of space and time [raissi2019physics, sirignano2018dgm, raissi2018deep]. These methods are meshfree, geometrically flexible and suitable for high dimensional PDEs. However, since these methods directly approximate the PDE solution as a function of space and time, the learned model is dependent on the specific initial and boundary condition used in training.

The second type of methods learns the differential operators using convolutional neural networks (CNNs)

[long2018pde, de2019deep, long2019pde, bar2019learning]. These methods are independent of initial and boundary conditions, but they are not suitable for high-dimensional PDEs and can only be used when data is available in a dense regular grid. Furthermore, both types of methods often assume strong physics prior either in model structure [long2018hybridnet, de2019deep, saha2020phicnet, sirignano2020dpm]

or in loss function

[raissi2019physics, sirignano2018dgm, geneva2020modeling] limiting their applicability in scenarios where the dynamics is unknown.

One promising approach for solving or learning PDEs using irregular scattered data is to utilize graph neural networks (GNNs) [scarselli2008graph]. Seo and Liu [seo2019differentiable] proposed differentiable physics-informed graph networks (DPGN) which incorporates differentiable physics equations in GNN to model complex dynamical systems. However, difference operators on graph can extract spatial variations only when the nodes are close to each other and contribute to large numerical error when data sites are sparsely distributed [seo2019physics].

2 Background

2.1 Radial Basis Functions (RBFs) Interpolation

In standard scattered data interpolation problem we are generally given with a set of

distinct data points and corresponding real values (measurements) . The goal is to find a (continuous) function that satisfies the interpolation conditions

(2)

A common approach to solve this scattered data interpolation problem is to consider the function as a linear combination of basis functions of certain class. Often these basis functions are formed using radial functions. Value of a radial function at each point depends only on the some arbitrary distance between that point and origin such that where denotes some norm in , usually the Euclidean norm. Effectively, the function becomes a univariate function .

In radial basis functions interpolation, the basis functions are formed using the given datapoints as centers or origins. In terms of these basis functions , the interpolant takes the following form:

(3)

where, are the unknown coefficients to be determined. Plugging the interpolation conditions (2) into (3) leads to a system of linear equations

(4)

where is the interpolating matrix whose entries are given by ,

is the coefficient vector

and .

Solution to the system (4) exists and is unique provided the interpolating matrix is non-singular. For a large class of radial functions including (inverse) multiquadrics, Gaussian, the matrix is non-singular if the datapoints are all distinct [fasshauer2007meshfree]. Details regarding the invertibility of the interpolating matrix for various choices of the radial functions can be found in [buhmann2003radial, fasshauer2007meshfree].

2.2 Solving time-dependent PDEs using RBFs

Radial basis functions are widely used in meshfree methods for solving PDE problems. Here we will describe how RBFs are used, particularly Kansa’s unsymmetric collocation method [kansa1990multiquadrics], to solve time-dependent linear PDEs on which we will base our method.

Consider the following time-dependent linear PDE:

(5)

with initial and boundary conditions

(6)
(7)

is some linear (spatial) differential operator, denotes the boundary of .

In order to obtain an approximate solution of the above PDE, consider be a set of collocation nodes. Furthermore, let us assume that out of the these collocation nodes, the last nodes are boundary nodes, i.e. the subset . Using some radial basis functions centered at the collocation nodes, an approximate solution to the PDE can be defined as follows:

(8)

where are the unknown time-dependent coefficients to be determined at each time step. Plugging (8) into (4), (6), and (7

) we get the following ordinary differential equation (ODE)

(9)

with initial and boundary conditions

(10)
(11)

In (9), the spatial differential operator is applied to the radial basis function. Approximating the time-derivatives by first order finite difference, we get

(12)

where is the step-size in time. The coefficients can be determined by applying (11) and (12) on the boundary collocation nodes of and interior collocation nodes of . In that case, (10), (11) and (12) can be written as the following compact form:

(13)

is the RBF matrix whose entries are given by , and is the coefficient vector at time . The elements of the matrix are given by

(14)

and, is the boundary condition vector at time . is the vector of initial values at the collocation nodes. The coefficient vectors are obtained by iteratively solving (13) which are used to compute the numerical solution by (8).

3 Deep RBF Collocation

(a)
(b)
(c)
Figure 1: (a): Proposed model for linear PDE; is the RBF interpolation matrix with trainable shape parameter ; the unknown linear differential operator

is modeled using a feed-forward neural network

with parameter vector . (b): Proposed model for nonlinear PDE; is same as except it has

output neurons instead of just one;

is the neural network representation (with parameter vector ) of the unknown function . (c): Proposed model for a system of coupled PDEs.

We will first describe our approach for unknown linear PDE and then extend that to nonlinear PDE of form (1). Finally, we will show how the proposed model can be used to learn an unknown systems of coupled nonlinear PDEs.

3.1 Modeling unknown linear PDE

Consider the linear time-dependent PDE of (5) where the linear (spatial) differential operator is unknown. Learning the application of directly on requires fine-grid measurement for convolution operations. Rather, we propose to learn the application of on a radial basis function that allows scattered sparse measurements independent of any specific geometry. Specifically, we propose to learn a feed-forward neural network , with parameter vector , that approximates given any two spatial locations and corresponding value of the RBF . We train the neural network using scattered time-series measurement of by minimizing an objective function derived from (13). Let be the set of measurement sites and be the measurement vector at time . Since our goal is to learn a model for spatial differential operator and boundary values are not effected by , we only consider internal measurement sites during training. For a radial basis function , the coefficient vector at time , , can be obtained by solving the linear system

(15)

where is the RBF matrix whose entries are given by , . Replacing with , ignoring boundary nodes, and plugging (15) into (13) we get

(16)

where

is the identity matrix of order

, denotes the approximated spatial derivative matrix, i.e. , and, is the trainable shape parameter of the RBF. So, for a time-series of measurements , the neural network is trained with the following objective:

(17)

Note, the term in (16) basically multiplies the approximated spatial derivative matrix with coefficient vector to obtain the radial approximation of . Once the model is trained, it can be used to get the value of (Figure 1a) at any generic location and to finally predict . During prediction phase, the boundary condition is applied in similar fashion as in (13). Considering the last nodes in as boundary nodes and incorporating corresponding boundary condition of (11) in (16), we get

(18)

where the rows of matrix are given by

(19)

Once the model is trained, during prediction the matrix is solely determined from the configuration of measurement sites. Under a fixed Dirichlet boundary condition , the iterative scheme of (18) is basically a linear discrete-time dynamical system and its convergence to a equilibrium point can be analyzed from the spectral properties of .
The linear discrete-time dynamical system converges to a stable equilibrium point from any arbitrary initialization if and only if , where

are the eigenvalues of

.

3.2 Modeling unknown nonlinear PDE

In order to model an unknown nonlinear PDE of form (1), we use the proposed model for linear PDE as the base module. The input arguments of the nonlinear function in (1) are basically various unknown linear spatial derivatives of the measurement variable . Therefore, we first use the linear model to generate a spatial derivative feature vector and then pass that through another feed-forward neural network , with parameter vector , that approximates the function . The neural network is modified to generate a vector of length (instead of a scalar as in the case of linear PDE) whose entries basically represent spatial derivative features of the RBF . We denote this spatial derivative feature extractor neural network as . The output of is multiplied with the coefficient vector to get the spatial derivative features of , which is used by to obtain the final output. Figure 1b shows the overall model for the nonlinear PDE. In this case, for a time-series of measurements (at internal nodes) , the neural networks and are trained jointly with the following objective:

(20)

where

is a 3D tensor containing the

spatial derivative matrices of the RBF. Note, the objective (20) considers nonlinear PDE of form (1) where the temporal order is . In general, for , we use the finite difference approximation of and the objective is modified as

(21)

The training objective of (20) or (21) uses measurements from all sites to predict the values of at those sites, but does not incorporate any constraints to generalize at locations where measurement is not available. Accuracy of RBF-based methods at any generic location strongly depends on the shape parameter of the RBF. Generally the shape parameter is chosen using leave-one-out cross validation method [rippa1999algorithm, fasshauer2007choosing]. Note, as the spatial differential operators are functions of , the parameters of are dependent on . We adopt the leave-one-out strategy in our training algorithm. At each training step, we randomly leave one measurement site and the model uses the remaining sites for prediction. The left out measurement site is used only for computing the loss function. The overall training procedure is summarized in Algorithm 1.

Input: Training samples {measurement sites ,
                          measurement sequence },
a radial function
1 Arbitrarily initialize parameters while not converged do
2       Randomly select sample sequence {} Randomly choose from Compute RBF matrix such that Forward pass to get such that Forward pass to get Update in the gradient descent direction of objective (21)
3 end while
Output: Learned parameters
Algorithm 1 Training Procedure

Once the models and are trained, they can be used to get the approximated value of (Figure 1b) at any generic location and to finally predict .

3.3 Modeling unknown system of coupled nonlinear PDEs

Consider a coupled system of nonlinear PDEs associated with measurement variables . In order to learn such a system, we require radial approximation for each of these measurement variables. Therefore, given measurement sites, we consider a RBF coefficient matrix (at time ) where each column corresponds to the coefficient vector of one measurement variable. Since the neural network operates only on the spatial locations, its parameters are shared across all measurement variables. The output of , i.e. the spatial derivative features of the RBF , is multiplied with coefficient matrix to obtain the spatial derivative features of length for each of the measurement variables. These spatial derivative features are concatenated and finally passed through the neural network to get the final vector valued () output. Figure 1c shows the overall model for the coupled system of nonlinear PDEs.

4 Experimental Evaluation

We evaluate the proposed method for two dimensional wave equation and Burgers-Fisher equation. The first one is a linear PDE, whereas the latter is a coupled system of two nonlinear PDEs. Though we show results for two (spatial) dimensional system, the method is scalable to higher dimensional system due to RBF’s inherent dimension independent formulation.

4.1 Training and testing settings

For both experiments, we train the models with data from discrete measurement sites on a square geometry . The measurement sites are chosen using K-means clustering. For testing, we consider four different settings:

  1. same square geometry with same number of measurement sites (though their locations are different)

  2. same square geometry with different number of measurement sites

  3. a circular geometry with different boundary condition

  4. an annular geometry with different boundary condition

Different geometries with example measurement site configurations are shown in Figure 2.

Figure 2: (a): Training geometry, (b, c): two additional geometries used for testing. Red markers show example measurement sites.

4.2 Model and optimization hyperparameters

We use Gaussian function as the radial function, where is the shape parameter.
For and , we use fully connected networks of layer configurations and

, respectively, with ReLU activations.

is the dimension of the domain; for both the experiments. denotes the length of spatial derivative feature vector; we use . is the number of PDEs present in the system; for wave equation and for Burgers-Fisher equation.

We use sequences of length for training and validation; another sequences are used for testing. We train the models for epochs in mini-batches of using Adam optimizer [kingmaB14] with an initial learning rate of . During prediction, at each timestep, the linear system is solved using -regularized least square with regularization parameter

instead of direct inversion to avoid blowing up coefficients due to noisy estimates. Models are implemented, trained and tested in PyTorch

[paszke2019pytorch].

4.3 Evaluation metrics

We use Signal-to-Noise Ratio (SNR) as metric to evaluate the prediction quality. Depending on the scenario, we may seek prediction only on the measurement sites

or on the entire domain . Therefore, we define two SNR metrics as follows,

(22)
(23)

where denotes the predicted value of at time at location .

Figure 3: Quantitative analysis for wave equation. (a, b): SNR comparison in settings (i) with different number of measurement sites; same number of measurement sites is used in both training and testing. (d, e): SNR comparison in settings (ii). (c): SNR comparison in settings (iii). (f): SNR comparison in settings (iv). SNR and SNR respectively denote SNR only at the measurement sites and on the entire domain .
Figure 4: Visualization of the true and predicted states (on entire domain ) of wave equation over time in different settings.
Figure 3: Quantitative analysis for wave equation. (a, b): SNR comparison in settings (i) with different number of measurement sites; same number of measurement sites is used in both training and testing. (d, e): SNR comparison in settings (ii). (c): SNR comparison in settings (iii). (f): SNR comparison in settings (iv). SNR and SNR respectively denote SNR only at the measurement sites and on the entire domain .
Figure 5: Quantitative analysis for Burgers-Fisher equation. (a, b): SNR comparison in settings (i) with different number of measurement sites; same number of measurement sites is used in both training and testing. (d, e): SNR comparison in settings (ii). (c): SNR comparison in settings (iii). (f): SNR comparison in settings (iv).
Figure 6: Visualization of the true and predicted states (on entire domain ) of one measurement variable of 2D Burgers-Fisher equation over time in different settings.
Figure 5: Quantitative analysis for Burgers-Fisher equation. (a, b): SNR comparison in settings (i) with different number of measurement sites; same number of measurement sites is used in both training and testing. (d, e): SNR comparison in settings (ii). (c): SNR comparison in settings (iii). (f): SNR comparison in settings (iv).

4.4 Wave equation

We consider the two-dimensional wave equation given by

(24)

where is the (constant) wave propagation speed and denotes the differential operator Laplacian. Dataset is generated using finite element method in PDE toolbox of MATLAB. We use , and the following initial condition,

(25)

where , , and are chosen randomly for each sequence.

denotes the uniform distribution. For training dataset and test settings (i) and (ii), we use the Dirichlet boundary condition

. For test settings (iii) and (iv), following boundary condition is used,

(26)

Figure 4 shows the quantitative comparison of proposed method in different settings in the task of forecasting. Increasing the number of measurement sites improves SNR (Figure 4(a,b)). Comparing Figures 4a and 4b, it can be seen that difference between prediction accuracy at measurement sites only (SNR) and prediction accuracy on the entire domain (SNR) decreases with time. Initially, the neural networks compensate for the spatial approximation error of RBFs to some extent and provide better accuracy at the measurement sites since they are trained only at those sites. However, as the prediction quality of the neural networks degrades over time, both the SNRs become very similar. Figure 4(d,e) show that the prediction quality is maintained even if we use different number of measurement sites compared to the training scenario. However, a larger change in number of measurement site would require retuning the RBF shape parameter. Figure 4(c,f) show that the learned model can be used in different settings (different geometries with different boundary conditions). The model is trained in just one geometry with one boundary condition resulting in accuracy drop when tested in other settings; generalizability of the learned model can be improved if trained in multiple settings. Qualitative comparison of predicted maps in different settings along with ground truth is depicted in Figure 4.

4.5 Burgers-Fisher equation

We consider the two-dimensional coupled Burgers-Fisher equations given by

(27)

where and are constant non-negative parameters, and denotes the gradient operator. We use to denote a column vector, whereas represents a closed interval. Dataset is generated using finite element method in PDE toolbox of MATLAB. We use , and the following initial condition,

(28)

where are chosen randomly for each sequence.

denotes the normal distribution, and

denotes dot product. For training dataset and test settings (i) and (ii), we use the Dirichlet boundary condition . For test settings (iii) and (iv), we use the same boundary condition as (26). Measurement in real environments are often noisy; we add Gaussian noise to the generated data, where

is the standard deviation of the clean training data.

Figure 6 shows the quantitative comparison of proposed method in different settings in the task of forecasting. In this case, improvement in prediction accuracy due to changes in number of measurement sites is relatively less (Figure 6(a,b)). Convergence of the system to a smooth state (as opposed to wave system) can be attributed to the saturation in RBF approximation with relatively less number of measurement sites. Qualitative comparison of predicted maps in different settings along with ground truth is depicted in Figure 6.

5 Conclusion

By integrating RBF collocation method for solving PDEs with deep learning, we developed a framework for modeling unknown PDEs from sparse observations. RBF framework enables meshfree computation and makes the method viable for learning high-dimensional PDEs. Learning the application of differential operators in time-stepping fashion allows the learned model to be transferable to different settings of initial and boundary conditions.

Although numerical experiments show promising results, the proposed method has few limitations that we would like to address in future work. First, even though the current model can deal with small changes in the number of measurement sites without retraining, a larger change would require very different value of the shape parameter. Since the parameters of the neural networks depends on the shape parameter of the RBF, those also require retuning to work in a setting where number of measurement sites is very different from training setting. Developing a neural network model that is independent of the RBF shape parameter would be an interesting problem to solve.

Secondly, accuracy of the current method drops if we test in a different setting involving different geometry and, different boundary and initial conditions. Pure data-driven models do not perform well in general when testing scenario is very different from training scenario. Incorporating knowledge about the system or constraining the learned operators to maintain physical properties can increase the generalizability of the models.

Another limitation of the current method is that it assumes the physical parameter (e.g. diffusion coefficient, wave speed) of the underlying system is uniform. However, in real scenarios, these physical parameters often vary across space and time. Adapting the model with changes in physical parameters of the system would be another interesting challenge to address.

References