Partial differential equations (PDEs) are widely used in the mathematical formulation of physical phenomena in a variety of science and engineering applications such as modeling fluid flow, mechanical stress or material temperature among others. The analytic solutions of PDEs are not often available and therefore several numerical methods such as Finite Difference Methods (FDM) mohanty2005unconditionally, Finite Element Methods thomee2001finite, splines abushama2008modified; kumar2010methods, finite volume method shakeri2011finite, Spectral based method taleei2014time have been developed for approximating the solution of the given PDEs.
In particular, in finite difference-based methods, the domain of the PDE is discretized. The solution is only provided for the predefined grid points and additional interpolation is required to obtain the solution for the whole domain. Moreover, the method has a low accuracy in irregular domains, which limits its application in such domains. The finite-element method relies on the discretization of the domain via meshing which can be a challenging and time-consuming process, especially for complex geometries or higher-dimensional PDEs. In addition, similar to finite difference methods, the solution is approximated locally at each mesh point and therefore additional interpolation is required to find the solution at an arbitrary point in the domainmehrkanoon2015learning
Another class of methods that has been proposed in the literature for the simulation of dynamical systems is based on machine learning approaches and in particular kernel-based models as well as artificial neural networks. The use of neural network-based models for solving ordinary and partial differential equations goes back to the early ’90s, seemeade1994numerical; lee1990neural; van1995neural; ramuhalli2005finite. The Hopfield neural networks are used in lee1990neural to solve first-order differential equations. The authors in lagaris1998artificial introduced a feed-forward neural network-based model to solve ordinary and partial differential equations. In their work, the model function is expressed as a sum of two terms where the first term, which contains no adjustable parameters, satisfies the initial/boundary conditions and the second term involves a trainable feed-forward neural network. In contrast to mesh-based approaches such as finite difference and finite element methods, neural network models (see meade1994numerical; choi2009comparison; shirvany2009multilayer) can generate a closed-form solution and do not require meshing.
Mehrkanoon et al. mehrkanoon2012approximate; mehrkanoon2012ls; mehrkanoon2013ls; mehrkanoon2015learning
, for the first time, proposed a systematic machine learning approach based on primal-dual LS-SVM formulation to learn the solution of dynamical systems governed by a range of differential equations including ordinary differential equations (ODEs), partial differential equations (PDEs), differential algebraic equations (DAEs). Unlike the neural network based approaches described inlagaris1998artificial; lagaris2000neural that the user had to define a form of a trial solution, which in some cases is not straightforward, in the LS-SVM based approach the optimal model is derived by incorporating the initial/boundary conditions as constraints of an optimization problem.
In particular, in LS-SVM based model mehrkanoon2012approximate; mehrkanoon2015learning, the domain of the differential equation is first discretized to generate collocation points that are located inside the domain as well as on its initial and or boundary. Next, one starts with an LS-SVM representation of the solution in the primal and formulates a constrained optimization problem to obtain the optimal values for the model parameters (i.e. weights and biases). More precisely, the initial/boundary conditions of the differential equation are incorporated as constraints of an optimization problem. The aim of the formulated constrained optimization problem is to enforce the LS-SVM representation of the solution to satisfy the given differential equation on the collocation points inside the domain (through the defined objective of the optimization problem) as well as on the initial/boundary of the domain (through the defined hard constraints). One should note that this is not a regression task, as the solution of the differential equation is not provided during the training. In fact, by solving the constrained optimization problem, the optimal representation of the solution is obtained in the dual. The LSSVM-PDE code is available at 111https://github.com/SMehrkanoon/LSSVM-PDE-Solver.
It should also be noted that in the systematic machine learning approach presented in mehrkanoon2012approximate; mehrkanoon2012ls; mehrkanoon2015learning, one can alternatively start with a different representation than the LS-SVM representation; for instance, a neural networks based representation, see raissi2017physics. In addition, the hard constraints of the LS-SVM optimization formulation corresponding to the initial/boundary conditions of the PDE can also be relaxed and instead added as an additional term in the objective function of the optimization problem, see raissi2017physics. Therefore, motivated by the systematic LS-SVM approach mehrkanoon2012approximate; mehrkanoon2015learning, the authors in raissi2017physics
started with a feed-forward neural networks representation and introduced physics informed deep learning model and showed its effectiveness in solving differential equations. However, to the best of our knowledge, this existing link between the systematic LS-SVM approachmehrkanoon2012approximate; mehrkanoon2015learning for solving differential equations and the physics informed deep learning model raissi2017physics has not been explicitly stated in the literature. Sirignano and Spiliopoulos sirignano2018dgm developed the Deep Galerkin Method (DGM), where the solution of high-dimensional PDE is approximated by a neural network. Zhu et al. zhu2019physics developed a dense convolutional encoder-decoder network and E and Yu yu2017deep
proposed the Deep Ritz method, based on fully-connected layers and residual connections for solving PDEs.
It is the purpose of this paper to introduce a novel two-stream deep model based on Graph Convolutional Networks (GCNs) and feed-forward neural networks (FFNN) to learn the solution of the given differential equations. GCNs have been successfully applied in many application domains such as natural language processingpeng2018large; zhang2019longjohnson2018image; litany2018deformable and weather elements forecasting stanczyk2021deep; aykas2021multistream tasks. The use of GCNs for learning the solution of PDEs has not been well explored. In the context of PDEs, the grid and graph input data are obtained by discretizing the PDE domain. The aim of the proposed model is to learn from both types of input representations, i.e. grid and graph data. In particular, FFNN processes the grid data, while GCNs operates on graph data and learns the relation between the features by incorporating the neighborhood information through the adjacency matrix of the graph. This paper is organized as follows. The proposed model is described in section 2. The numerical experiments and discussion of the obtained results are given in section 3. The conclusion is drawn in section 4.
2 Proposed Model
2.1 Graph Structure Data
The domain of the PDE is first discretized into nodes. For a 2D-dimensional domain, , where and are the numbers of elements in space and time dimensions, respectively. We next construct a graph where in particular the neighbors of the -th node located inside the domain are -th nodes. The constructed graphs for equations with two and three-dimensional domains are shown in Fig. 1 (a) and (b), respectively. Here, in the case of two variables each node in the graph has neighbors in two directions and , while in the case of three variables nodes have neighbors in three directions , and . In the constructed graph, let be the set of all boundary condition nodes, be the set of all initial condition nodes and . In addition, let be the set of all nodes except the boundary and initial condition nodes, i.e. all nodes located inside the domain. The cardinality of the above-defined sets and are denoted by and .
2.2 Grid Structure Data
The same discretization steps that were previously used to create the graph data are also used here to generate the grid data points. However, as opposed to previously introduced graph data, the grid data points do not have edge information and only contain the grid data points.
2.3 GCN-FFNN model
Here we propose our two-stream deep neural networks architecture which consists of Graph Convolutional Networks (GCNs) and feed-forward neural networks (FFNN) models in its first and second streams, respectively. The GCN-based model received the graph input data while the grid data are fed to the FFNN model. The architecture of the proposed GCN-FFNN model is shown in Fig. 2. The proposed model is trained in two phases. In the first phase, the models in the two streams are trained separately. The same error function is used to adjust the parameters of both streams by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph training data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously employed error function. In addition to evaluating the proposed GCN-FFNN model, we have also individually examined each stream, i.e. FFNN as well as GCN-based models. It should be noted that the FFNN-based model has been previously proposed in the literature raissi2017physics, whereas the GCN-based model and its combination with the FFNN-based model are introduced in this paper. In what follows, the GCN-based model in the GCN-FFNN architecture is explained in more detail.
2.4 GCN-based model
A model based on core Graph Convolutional Network (GCN) kipf2016semi is developed and used in the first stream of the proposed GCN-FFNN model to learn the solution of partial differential equations. GCN is an efficient variant of convolutional neural networks which operates directly on graphs. In particular, the following layer-wise propagation rule is utilized in a multi-layer Graph Convolutional Network kipf2016semi:
is the sum of adjacency matrix and the identity matrix to include self connections.is the diagonal degree matrix with . is the matrix of activations in the -th layer with equals to the feature representation of the nodes. is the convolution weights for the -th layer and
is the activation function, in our case the hyperbolic tangent activation function is used. The model receives the input of the shape, where is number of nodes and denotes the number of node attributes. In our case, for 2-dimensional PDE, and for 3-dimensional PDE, . Furthermore, the graph structure, i.e. edge information, is also provided to the model through the adjacency matrix. The architecture of our proposed GCN-based model for learning the solutions of PDEs is shown in Fig. 3.
In particular, the model consists of one GCN layer followed by a residual connection and a hyperbolic tangent activation function. Next, three convolution layers with a filter size of , each with a hyperbolic tangent activation function are applied. The output of the last convolution layer is followed by a fully-connected layer.
We use the systematic machine learning approach presented in mehrkanoon2015learning to learn the parameters of the GCN model. The proposed GCN-based model receives all the nodes from and sets as input as well as the partial differential operator which is obtained by putting all the involved terms of the given PDEs on the left-hand side of the equation so that the right-hand side of the PDE is zero. The model then outputs an approximate solution for the given PDE. The solution is learned by solving an optimization problem whose objective consists of two terms corresponding to the losses defined on inside domain nodes as well as on the initial/boundary condition nodes. The input data to the model consists of the above-mentioned 2- or 3-dimensional nodes and their graph structure (adjacency matrix). It should be noted that here the adjacency matrix is sparse due to the way that the edges of the graph are constructed, see section 2.1. Here, we use an efficient optimization code available at poli2019graph for dealing with large scale sparse adjacency matrix in our GCN-layer.
2.5 Loss Function
Following the work of Mehrkanoon and Suykens mehrkanoon2015learning
, here we use a loss function that enforces the representation of the solution, obtained by our proposed GCN-based model architecture, to satisfy the given differential equations and its initial/boundary conditions. Similar tomehrkanoon2015learning, we aim at minimizing the mean squared loss function to adjust the model parameters. Here, the used loss function is composed of two terms given in Eq. (2). The first term, i.e. , enforces the representation of the solution to satisfy the given differential equation inside its domain. The second term, i.e. , corresponds to making the difference between the true initial/boundary solutions and the model predictions as small as possible. More precisely, given the differential operator and the collocation nodes from both inside the domain (i.e. ) as well as on the initial/boundary of the domain (i.e. ), the and losses are defined in equations (3) and (4), respectively.
3 Numerical Results
In this section, four experiments are performed to demonstrate the efficiency of the proposed GCN-FFNN model for learning the solution of 1D- and 2D-Burgers equations as well as 1D- and 2D-Schrödinger equations. The model parameters are learned in a transductive fashion. The input dataset is divided into train and test sets. The training set contains nodes from both inside the domain as well as its initial and boundary. Two scenarios are examined for test nodes, i.e. test nodes inside and outside the domain of PDE, see Fig. 4. In the first case, see Fig. 4 (a), some grid points along the x-dimension are first randomly selected and then all the (x,t)-nodes with those selected x-coordinate positions form the inside domain test nodes. It should be noted that for PDEs with 3-dimensional domains, the random grid points are selected along the x- and y-dimensions and subsequently all the nodes (x,y,t)-nodes with those selected (x,y)-coordinate positions form the inside domain test nodes. In the second case, see Fig. 4 (b), the test nodes are from outside the domain. In both test cases, the test set is 10% of the whole dataset.
The accuracy of the obtained approximate solution is measured by means of mean squared as well as infinite error norms defined as follows:
Here, where is the -th test node and
is the number of test nodes. The obtained results of three models, i.e. FFNN, GCN and GCN-FFNN, on the above-mentioned four equations are compared. All the models are trained using L-BFGS with a learning rate of 1.0 for a maximum of 50000 epochsliu1989limited
. In order to make a fair comparison, all the models receive the same training and test sets. The number of layers, hidden neurons and trainable parameters of each of the used modules of GCN-FFNN architecture for each question are empirically found and tabulated in Table1.
|PDE||Modul||# Layers||# Hidden Units||# Trainable Parameters|
|1D-Burgers||FFNN-based model raissi2017physics||8||20||2601|
|1D-Schrödinger||FFNN-based model li2021deep||6||100||40902|
|2D-Burgers||FFNN-based model li2021deep||8||20||2621|
|2D-Schrödinger||FFNN-based model li2021deep||5||50||7952|
|Inside the domain||Outside the domain|
3.1 1D-Burgers Equation
The 1D-Burgers equation with boundary and initial conditions is given in Eq. (6) raissi2017physics:
Here, the differential operator is defined as . (shown as in the equation) is the solution we seek to approximate with our proposed model. and are the partial derivatives of with respect to and , respectively. is the second partial derivative of with respect to .
In the first scenario, the two dimensional domain of the 1D-Burgers equation, i.e. , is divided into nodes, which are evenly spaced in each dimension. We have randomly selected of nodes to form the test nodes inside the domain. In the second scenario, the nodes in the ranges are used for training and the test nodes outside the domain are selected from . The obtained MSEs and infinity norm of each model are tabulated in Table 2. Both used metrics show that the proposed GCN-FFNN model achieved the best results for both inside and outside test nodes.
Fig. 5 corresponds to the first scenario where the test nodes are from inside the domain. In particular, Fig. 5 (a), (b) and (c) show the true and approximate solution obtained by GCN-FFNN model for the 1D-Burgers equation at , and . Here, the prediction at is from training set, whereas the predictions at and are from test set. The obtained residuals are shown in Fig. 5 (d), (e) and (f). Fig. 6 corresponds to the second scenario where the test nodes are from outside the domain. Fig. 6 (a), (b) and (c) show the true and approximate solution obtained by GCN-FFNN model. Here, the predictions at and are from the train set, while the predictions at are from the test set. Fig. 6 (d), (e) and (f) are the residuals at , and .
3.2 1D-Schrödinger Equation
The 1D-Schrödinger equation with boundary and initial conditions is described in Eq. (7) li2021deep.
The differential operator is defined as . Let and denote the real and imaginary components of , then the 1D-Schrödinger equation can be rewritten as follows li2021deep:
In the first scenario, the domain of the 1D-Schrödinger equation is divided into nodes, which are evenly spaced for each dimension. Following the previously mentioned approaches for creating the test nodes, of nodes are randomly selected to form the inside domain test nodes. In the second scenario, the nodes in the ranges are used for training and the outside domain test nodes are selected from . The obtained results for the 1D-Schrödinger equation on both inside and outside domain test nodes are shown in Table 2. FFNN outperforms the other models for inside domain test nodes. For the outside domain test nodes, the GCN-FFNN model shows the least MSE error compared to the other models, while its infinity norm error remains higher compared to FFNN.
Fig. 7 corresponds to the first scenario where the test nodes are from inside the domain. In Fig. 7 (a), (b) and (c) the true solution and the obtained results of GCN-FFNN model at , and are shown, respectively. Here, the prediction at is from training set, whereas the predictions at and are from test set. The obtained residuals are shown in Fig. 7 (d), (e) and (f). Fig. 8 corresponds to the second scenario where the the test nodes are from outside the domain. The true and approximate solutions at and are shown Fig. 8 (a), (b) and (c), respectively. Here, the predictions at and are from the training set, while the predictions at are from the outside domain test nodes. The obtained residuals are shown in Fig. 8 (d), (e) and (f).
3.3 2D-Burgers Equation
The 2D-Burgers equation with boundary and initial conditions is given in Eq. (9) li2021deep.
Here, the differential operator is defined as
The domain of the 2D-Burgers equation, i.e. , is divided into nodes. In the first scenario, of nodes are randomly selected to form the inside domain test nodes and the remaining nodes are used for training the models. In the second scenario, the nodes in the ranges are used for training and the outside domain test nodes are selected from . From Table 2, one can observe that for the 2D-Burgers equation, the GCN-FFNN model outperforms the other models on inside domain test nodes using both MSE and infinity norms. For outside domain test nodes, GCN outperforms the other models in terms of MSE metric, while the GCN-FFNN model achieved the least infinity norm compared to the other models.
3.4 2D-Schrödinger Equation
The 2D-Schrödinger equation with boundary and initial conditions are depicted in Eq. (10) li2021deep
Here, the differential operator is defined as Let and denote the real and imaginary components of , then the 2D-Schrödinger equation can be rewritten as follows li2021deep:
The domain of the 2D-Schrödinger equation, i.e. , is divided into nodes. In the first scenario, of nodes are randomly selected to form the inside domain test nodes and the remaining nodes are used for training the models. In the second scenario, the nodes in the ranges are used for training and the outside domain test nodes are selected from . As can be seen from Table 2, for the 2D-Schrödinger equation, FFNN and GCN-FFNN models achieved comparable results on inside domain test nodes using both MSE and infinity norm metrics. For outside domain test nodes, the GCN-FFNN model outperforms other models using the MSE metric while it also achieved comparable results to the FFNN model in terms of infinity norm metric. The true solution and the obtained residual at outside domain test time are shown in Fig. 10 (a) and (b), respectively.
In this paper, a new two-stream architecture based on graph convolutional network (GCN) and feed-forward neural networks (FFNN) is developed for solving partial differential equations (PDEs). The model learns from both grid and graph input representations obtained by discretizing the domain of the given PDE. The proposed model is examined on four nonlinear PDEs, i.e. 1D-Burgers, 1D-Schrödinger, 2D-Burgers and 2D-Schrödinger equation. The performance of the models are evaluated on test data located inside and outside the domain. Thanks to the incorporation of both types of input representations, most of the time the proposed GCN-FFNN model outperforms the other tested models for the studied PDEs. The implementation of our GCN-FFNN model is available at 222https://github.com/onurbil/pde-gcn.