1 Introduction
Complex physical phenomena are often modeled with a set of differential equations (DEs). Traditionally, model equations are constructed based on physical laws and empirical observation. However, in many practical scenarios, mathematical modeling is very challenging and/or can only describe the system behavior partially. Furthermore, even if the involved differential equations are known, their analytical solutions are difficult to obtain for most practical problems and therefore, must be solved numerically [mai2001numerical]. Traditional accurate numerical solvers are computationally intensive. The success of deep learning in various sequence prediction tasks, for example natural language and speech modeling [sutskever2014sequence, vaswani2017attention, chung2014empirical], video prediction [xingjian2015convolutional, finn2016unsupervised, lee2017desire], has motivated many researchers to approach both the aforementioned problems using deep learning. Consequently, a number of deep learning models which incorporate knowledge from physics or applied mathematics, have been introduced for modeling complex dynamical systems involving partial differential equations (PDEs) [raissi2019physics, long2018pde, long2018hybridnet, de2019deep, saha2020phicnet, sirignano2018dgm, khoo2017solving, geneva2020modeling, sirignano2020dpm, bar2019learning].
Typically deep learning methods requires huge amount of data to obtain generalized solution. However, in many practical scenarios, data or observation can be obtained only in few sites (for example, sensor measurements). In this paper, we consider the problem of modeling unknown PDEdriven spatiotemporal dynamics from sparselyobserved data. Given a set of distinct scattered spatial locations and corresponding time series of measurements of a physical quantity at those locations , we want to learn a model for the unknown PDE that governs the data. We assume the underlying PDE belongs to a family of PDEs described by the following generic form:
(1) 
denotes the order partial differential operator with respect to time and denotes all possible combination of order spatial (partial) differential operators. We assume that the temporal order () of the underlying PDE is known a priori while the form of the function , nonlinear in general, and its arguments are unknown. Though we observe the system at few discrete locations, we seek to predict the evolution of at any generic location or over the entire . We want the learned model to construct the solution of the underlying PDE in time marching fashion, given any initial and boundary conditions.
Our method integrates radial basis function (RBF) collocation method with deep learning for modeling unknown PDEs from sparselyobserved data. RBF collocation method is widely used as numerical solver for PDEs because of its inherent meshfree nature that provides geometric flexibility with scattered node layout [bollig2012solution]. Another advantage of RBFbased methods is that they can be used to solve high dimensional problem since they convert problems involving multiple space dimensions into virtually onedimensional [fasshauer2007meshfree, wu2004meshless]
. We propose to learn the application of the differential operators on a chosen RBF using deep neural networks which makes the method independent of initial and boundary conditions while also utilizing the meshfree, geometrically flexible and spatial dimensionindependent nature of RBF collocation method. By choosing infinitelysmooth RBFs (e.g. Gaussian, multiquadric etc.) and tuning their shape parameters, our method can be employed in scenarios where data sites are sparsely distributed. We first show how RBF collocation method for timedependent linear PDEs can be integrated with deep neural network to incorporate unknown linear PDEs. Next, we extend the architecture to model unknown nonlinear PDEs of form (
1) and finally to model coupled system of multiple nonlinear PDEs. We evaluate our method in the forecasting task for two dimensional wave equation and BurgersFisher equation, and demonstrate the capability of transferring the learned model in different geometries with different boundary conditions.Related work.
Deep learning methods for learning or solving PDEdriven system can be broadly categorized into two types. Fhe first kind of methods directly approximates the PDE solution with deep neural networks (DNNs) as a function of space and time [raissi2019physics, sirignano2018dgm, raissi2018deep]. These methods are meshfree, geometrically flexible and suitable for high dimensional PDEs. However, since these methods directly approximate the PDE solution as a function of space and time, the learned model is dependent on the specific initial and boundary condition used in training.
The second type of methods learns the differential operators using convolutional neural networks (CNNs)
[long2018pde, de2019deep, long2019pde, bar2019learning]. These methods are independent of initial and boundary conditions, but they are not suitable for highdimensional PDEs and can only be used when data is available in a dense regular grid. Furthermore, both types of methods often assume strong physics prior either in model structure [long2018hybridnet, de2019deep, saha2020phicnet, sirignano2020dpm]or in loss function
[raissi2019physics, sirignano2018dgm, geneva2020modeling] limiting their applicability in scenarios where the dynamics is unknown.One promising approach for solving or learning PDEs using irregular scattered data is to utilize graph neural networks (GNNs) [scarselli2008graph]. Seo and Liu [seo2019differentiable] proposed differentiable physicsinformed graph networks (DPGN) which incorporates differentiable physics equations in GNN to model complex dynamical systems. However, difference operators on graph can extract spatial variations only when the nodes are close to each other and contribute to large numerical error when data sites are sparsely distributed [seo2019physics].
2 Background
2.1 Radial Basis Functions (RBFs) Interpolation
In standard scattered data interpolation problem we are generally given with a set of
distinct data points and corresponding real values (measurements) . The goal is to find a (continuous) function that satisfies the interpolation conditions(2) 
A common approach to solve this scattered data interpolation problem is to consider the function as a linear combination of basis functions of certain class. Often these basis functions are formed using radial functions. Value of a radial function at each point depends only on the some arbitrary distance between that point and origin such that where denotes some norm in , usually the Euclidean norm. Effectively, the function becomes a univariate function .
In radial basis functions interpolation, the basis functions are formed using the given datapoints as centers or origins. In terms of these basis functions , the interpolant takes the following form:
(3) 
where, are the unknown coefficients to be determined. Plugging the interpolation conditions (2) into (3) leads to a system of linear equations
(4) 
where is the interpolating matrix whose entries are given by ,
is the coefficient vector
and .Solution to the system (4) exists and is unique provided the interpolating matrix is nonsingular. For a large class of radial functions including (inverse) multiquadrics, Gaussian, the matrix is nonsingular if the datapoints are all distinct [fasshauer2007meshfree]. Details regarding the invertibility of the interpolating matrix for various choices of the radial functions can be found in [buhmann2003radial, fasshauer2007meshfree].
2.2 Solving timedependent PDEs using RBFs
Radial basis functions are widely used in meshfree methods for solving PDE problems. Here we will describe how RBFs are used, particularly Kansa’s unsymmetric collocation method [kansa1990multiquadrics], to solve timedependent linear PDEs on which we will base our method.
Consider the following timedependent linear PDE:
(5) 
with initial and boundary conditions
(6) 
(7) 
is some linear (spatial) differential operator, denotes the boundary of .
In order to obtain an approximate solution of the above PDE, consider be a set of collocation nodes. Furthermore, let us assume that out of the these collocation nodes, the last nodes are boundary nodes, i.e. the subset . Using some radial basis functions centered at the collocation nodes, an approximate solution to the PDE can be defined as follows:
(8) 
where are the unknown timedependent coefficients to be determined at each time step. Plugging (8) into (4), (6), and (7
) we get the following ordinary differential equation (ODE)
(9) 
with initial and boundary conditions
(10) 
(11) 
In (9), the spatial differential operator is applied to the radial basis function. Approximating the timederivatives by first order finite difference, we get
(12) 
where is the stepsize in time. The coefficients can be determined by applying (11) and (12) on the boundary collocation nodes of and interior collocation nodes of . In that case, (10), (11) and (12) can be written as the following compact form:
(13) 
is the RBF matrix whose entries are given by , and is the coefficient vector at time . The elements of the matrix are given by
(14) 
and, is the boundary condition vector at time . is the vector of initial values at the collocation nodes. The coefficient vectors are obtained by iteratively solving (13) which are used to compute the numerical solution by (8).
3 Deep RBF Collocation
is modeled using a feedforward neural network
with parameter vector . (b): Proposed model for nonlinear PDE; is same as except it hasoutput neurons instead of just one;
is the neural network representation (with parameter vector ) of the unknown function . (c): Proposed model for a system of coupled PDEs.We will first describe our approach for unknown linear PDE and then extend that to nonlinear PDE of form (1). Finally, we will show how the proposed model can be used to learn an unknown systems of coupled nonlinear PDEs.
3.1 Modeling unknown linear PDE
Consider the linear timedependent PDE of (5) where the linear (spatial) differential operator is unknown. Learning the application of directly on requires finegrid measurement for convolution operations. Rather, we propose to learn the application of on a radial basis function that allows scattered sparse measurements independent of any specific geometry. Specifically, we propose to learn a feedforward neural network , with parameter vector , that approximates given any two spatial locations and corresponding value of the RBF . We train the neural network using scattered timeseries measurement of by minimizing an objective function derived from (13). Let be the set of measurement sites and be the measurement vector at time . Since our goal is to learn a model for spatial differential operator and boundary values are not effected by , we only consider internal measurement sites during training. For a radial basis function , the coefficient vector at time , , can be obtained by solving the linear system
(15) 
where is the RBF matrix whose entries are given by , . Replacing with , ignoring boundary nodes, and plugging (15) into (13) we get
(16) 
where
is the identity matrix of order
, denotes the approximated spatial derivative matrix, i.e. , and, is the trainable shape parameter of the RBF. So, for a timeseries of measurements , the neural network is trained with the following objective:(17) 
Note, the term in (16) basically multiplies the approximated spatial derivative matrix with coefficient vector to obtain the radial approximation of . Once the model is trained, it can be used to get the value of (Figure 1a) at any generic location and to finally predict . During prediction phase, the boundary condition is applied in similar fashion as in (13). Considering the last nodes in as boundary nodes and incorporating corresponding boundary condition of (11) in (16), we get
(18) 
where the rows of matrix are given by
(19) 
Once the model is trained, during prediction the matrix is solely determined from the configuration of measurement sites. Under a fixed Dirichlet boundary condition , the iterative scheme of (18) is basically a linear discretetime dynamical system and its convergence to a equilibrium point can be analyzed from the spectral properties of .
The linear discretetime dynamical system converges to a stable equilibrium point from any arbitrary initialization if and only if , where
are the eigenvalues of
.3.2 Modeling unknown nonlinear PDE
In order to model an unknown nonlinear PDE of form (1), we use the proposed model for linear PDE as the base module. The input arguments of the nonlinear function in (1) are basically various unknown linear spatial derivatives of the measurement variable . Therefore, we first use the linear model to generate a spatial derivative feature vector and then pass that through another feedforward neural network , with parameter vector , that approximates the function . The neural network is modified to generate a vector of length (instead of a scalar as in the case of linear PDE) whose entries basically represent spatial derivative features of the RBF . We denote this spatial derivative feature extractor neural network as . The output of is multiplied with the coefficient vector to get the spatial derivative features of , which is used by to obtain the final output. Figure 1b shows the overall model for the nonlinear PDE. In this case, for a timeseries of measurements (at internal nodes) , the neural networks and are trained jointly with the following objective:
(20) 
where
is a 3D tensor containing the
spatial derivative matrices of the RBF. Note, the objective (20) considers nonlinear PDE of form (1) where the temporal order is . In general, for , we use the finite difference approximation of and the objective is modified as(21) 
The training objective of (20) or (21) uses measurements from all sites to predict the values of at those sites, but does not incorporate any constraints to generalize at locations where measurement is not available. Accuracy of RBFbased methods at any generic location strongly depends on the shape parameter of the RBF. Generally the shape parameter is chosen using leaveoneout cross validation method [rippa1999algorithm, fasshauer2007choosing]. Note, as the spatial differential operators are functions of , the parameters of are dependent on . We adopt the leaveoneout strategy in our training algorithm. At each training step, we randomly leave one measurement site and the model uses the remaining sites for prediction. The left out measurement site is used only for computing the loss function. The overall training procedure is summarized in Algorithm 1.
Once the models and are trained, they can be used to get the approximated value of (Figure 1b) at any generic location and to finally predict .
3.3 Modeling unknown system of coupled nonlinear PDEs
Consider a coupled system of nonlinear PDEs associated with measurement variables . In order to learn such a system, we require radial approximation for each of these measurement variables. Therefore, given measurement sites, we consider a RBF coefficient matrix (at time ) where each column corresponds to the coefficient vector of one measurement variable. Since the neural network operates only on the spatial locations, its parameters are shared across all measurement variables. The output of , i.e. the spatial derivative features of the RBF , is multiplied with coefficient matrix to obtain the spatial derivative features of length for each of the measurement variables. These spatial derivative features are concatenated and finally passed through the neural network to get the final vector valued () output. Figure 1c shows the overall model for the coupled system of nonlinear PDEs.
4 Experimental Evaluation
We evaluate the proposed method for two dimensional wave equation and BurgersFisher equation. The first one is a linear PDE, whereas the latter is a coupled system of two nonlinear PDEs. Though we show results for two (spatial) dimensional system, the method is scalable to higher dimensional system due to RBF’s inherent dimension independent formulation.
4.1 Training and testing settings
For both experiments, we train the models with data from discrete measurement sites on a square geometry . The measurement sites are chosen using Kmeans clustering. For testing, we consider four different settings:

same square geometry with same number of measurement sites (though their locations are different)

same square geometry with different number of measurement sites

a circular geometry with different boundary condition

an annular geometry with different boundary condition
Different geometries with example measurement site configurations are shown in Figure 2.
4.2 Model and optimization hyperparameters
We use Gaussian function as the radial function, where is the shape parameter.
For and , we use fully connected networks of layer configurations
and
, respectively, with ReLU activations.
is the dimension of the domain; for both the experiments. denotes the length of spatial derivative feature vector; we use . is the number of PDEs present in the system; for wave equation and for BurgersFisher equation.We use sequences of length for training and validation; another sequences are used for testing. We train the models for epochs in minibatches of using Adam optimizer [kingmaB14] with an initial learning rate of . During prediction, at each timestep, the linear system is solved using regularized least square with regularization parameter
instead of direct inversion to avoid blowing up coefficients due to noisy estimates. Models are implemented, trained and tested in PyTorch
[paszke2019pytorch].4.3 Evaluation metrics
We use SignaltoNoise Ratio (SNR) as metric to evaluate the prediction quality. Depending on the scenario, we may seek prediction only on the measurement sites
or on the entire domain . Therefore, we define two SNR metrics as follows,(22)  
(23) 
where denotes the predicted value of at time at location .
4.4 Wave equation
We consider the twodimensional wave equation given by
(24) 
where is the (constant) wave propagation speed and denotes the differential operator Laplacian. Dataset is generated using finite element method in PDE toolbox of MATLAB. We use , and the following initial condition,
(25) 
where , , and are chosen randomly for each sequence.
denotes the uniform distribution. For training dataset and test settings (i) and (ii), we use the Dirichlet boundary condition
. For test settings (iii) and (iv), following boundary condition is used,(26) 
Figure 4 shows the quantitative comparison of proposed method in different settings in the task of forecasting. Increasing the number of measurement sites improves SNR (Figure 4(a,b)). Comparing Figures 4a and 4b, it can be seen that difference between prediction accuracy at measurement sites only (SNR) and prediction accuracy on the entire domain (SNR) decreases with time. Initially, the neural networks compensate for the spatial approximation error of RBFs to some extent and provide better accuracy at the measurement sites since they are trained only at those sites. However, as the prediction quality of the neural networks degrades over time, both the SNRs become very similar. Figure 4(d,e) show that the prediction quality is maintained even if we use different number of measurement sites compared to the training scenario. However, a larger change in number of measurement site would require retuning the RBF shape parameter. Figure 4(c,f) show that the learned model can be used in different settings (different geometries with different boundary conditions). The model is trained in just one geometry with one boundary condition resulting in accuracy drop when tested in other settings; generalizability of the learned model can be improved if trained in multiple settings. Qualitative comparison of predicted maps in different settings along with ground truth is depicted in Figure 4.
4.5 BurgersFisher equation
We consider the twodimensional coupled BurgersFisher equations given by
(27) 
where and are constant nonnegative parameters, and denotes the gradient operator. We use to denote a column vector, whereas represents a closed interval. Dataset is generated using finite element method in PDE toolbox of MATLAB. We use , and the following initial condition,
(28) 
where are chosen randomly for each sequence.
denotes the normal distribution, and
denotes dot product. For training dataset and test settings (i) and (ii), we use the Dirichlet boundary condition . For test settings (iii) and (iv), we use the same boundary condition as (26). Measurement in real environments are often noisy; we add Gaussian noise to the generated data, whereis the standard deviation of the clean training data.
Figure 6 shows the quantitative comparison of proposed method in different settings in the task of forecasting. In this case, improvement in prediction accuracy due to changes in number of measurement sites is relatively less (Figure 6(a,b)). Convergence of the system to a smooth state (as opposed to wave system) can be attributed to the saturation in RBF approximation with relatively less number of measurement sites. Qualitative comparison of predicted maps in different settings along with ground truth is depicted in Figure 6.
5 Conclusion
By integrating RBF collocation method for solving PDEs with deep learning, we developed a framework for modeling unknown PDEs from sparse observations. RBF framework enables meshfree computation and makes the method viable for learning highdimensional PDEs. Learning the application of differential operators in timestepping fashion allows the learned model to be transferable to different settings of initial and boundary conditions.
Although numerical experiments show promising results, the proposed method has few limitations that we would like to address in future work. First, even though the current model can deal with small changes in the number of measurement sites without retraining, a larger change would require very different value of the shape parameter. Since the parameters of the neural networks depends on the shape parameter of the RBF, those also require retuning to work in a setting where number of measurement sites is very different from training setting. Developing a neural network model that is independent of the RBF shape parameter would be an interesting problem to solve.
Secondly, accuracy of the current method drops if we test in a different setting involving different geometry and, different boundary and initial conditions. Pure datadriven models do not perform well in general when testing scenario is very different from training scenario. Incorporating knowledge about the system or constraining the learned operators to maintain physical properties can increase the generalizability of the models.
Another limitation of the current method is that it assumes the physical parameter (e.g. diffusion coefficient, wave speed) of the underlying system is uniform. However, in real scenarios, these physical parameters often vary across space and time. Adapting the model with changes in physical parameters of the system would be another interesting challenge to address.
Comments
There are no comments yet.