## 1 Introduction

Physical phenomena are generally modeled through partial differential equations (PDEs) that govern the dynamic evolution or static solution of a physical system. Numerically solving partial differential equations is an important aspect of scientific and mathematical modeling in a broad range of fields including physics, biology, material science, and finance. There have been many efforts to develop efficient and accurate numerical solvers for PDEs using different techniques including finite difference leveque2007finite; shashkov2018conservative, finite volume eymard2000finite; brenner2008mathematical, and finite element schemes reddy2014introduction; wriggers2008nonlinear

. While these methods have been successful in producing accurate solutions, major challenges for accelerating and reducing computational cost when the governing PDE is known, and also determining the governing PDE when the physical system is unknown, remains to be addressed, problems such as those in climate modeling, turbulent flow, contact problems, or plastic material deformation. With the recent developments in deep learning, faster algorithms have been proposed to evaluate the response of a physical system, using only observational data. While deep learning approaches, such as multi-layer perceptron (MLP) or convolutional neural networks (CNNs), are powerful in learning PDE solutions, they are, however, restricted to a specific discretization of the physical domain in which they are trained. As a result, the learned model is limited to a specific domain and can not be generalized to solve on different domains or for different discretizations, although the underlying physics remains to be the same. New training is required for any change in the physical domain or discretization. Here we propose a discretization and domain invariant neural network time-dependent PDE solver based on message passing graph neural nets (MPGNN) which is trained on a sample domain with different discretizations. The trained MPGNN can then be used to solve for different discretization or even on other domains as long as the underlying physics remains the same. We further show that a recurrent version of MPGNN can be used to find a temporal sequence of solutions to a PDE.

## 2 Related Works

One class of neural-net-based PDE solvers focuses on using neural networks as proxies of PDEs and aims at finding the solution by minimizing a loss that corresponds to the solution satisfying the governing equations and the boundary conditions raissi2017physics; raissi2017physics2; lagaris1998artificial; weinan2018deep; sirignano2018dgm; khoo2021solving. Although such an approach helps to find the one-time solution of a PDE with an instance of parameters, a slight modification to the PDE parameters, boundary conditions, or the domain requires re-training of the network.

Another approach to solving PDEs is to use convolutional neural nets and snapshots of observations over the discretized input domain and to learn the dynamic evolution of a PDE long2018pde; shi2020finite

. Further modifications such as using residual connections

ruthotto2020deep, or autoregressive dense encoder-decoder geneva2020modeling, or symbolic multi-layer neural network long2019pde in addition to the CNN can be used to improve the results. While these models do not require prior knowledge of the PDE, they are limited to domain discretization (as a result cannot be generalized to arbitrary domains) and are limited to certain time discretization (as a result unable to handle temporally and spatially sparse or non-uniform observations).Inspired by the discretization techniques in solving PDEs, a class of methods uses observational data to learn the discretization approximation required for the updates in classical computational PDE solver methods bar2019learning; kochkov2021machine; zhuang2021learned; han2018solving

. In this approach, a neural network is used for better interpolation at coarse scale to be used in the framework of traditional numerical discretization. These methods are used in conjunction with classical numerical methods and can improve the accuracy and accelerate the solutions of the traditional numerical schemes

kochkov2021machine. Although these methods have been shown to generalize to new parameter regimes of a PDE, they are still bounded to the initially trained discretization and can not be used for arbitrary domains without re-training.Lastly, a class of neural PDE solvers focus on graph representation of the discretized mesh data-structure to approximate the PDE solution li2020neural; li2020multipole; iakovlev2020learning; belbute2020combining. The numerical solution of a PDE is an approximation of the solution on discrete locations comprising a discretized mesh of continuous space. Each node represents a region in the continuous space and the approximate solution of the PDE in that region is assigned to the representative node. The discretized mesh forms a graph where each node is used to model the state of the system and forms a connectivity graph connecting to the neighboring nodes. This method has successfully been used to solve time-independent PDEs with different mesh sizes on the same physical domain li2020neural. The connectivity and the location of the nodes can further be optimized to learn the solution with different levels of precision alet2019graph. If the PDE includes long-range interactions, which happens mostly in time-independent PDEs, a multi-level graph neural network framework to encapsulate long-range interactions can be used to improve the results li2020multipole. In contrast to time-independent PDEs, in the case of time-dependent PDEs, it has been shown that a continuous-time model similar to physics informed neural nets but with a graph neural network can be used to recover system’s dynamics with sparse observational data which is recorded at irregular times iakovlev2020learning; poli2019graph.

Recently, it have been shown that message passing graph neural networks can be used to implement powerful physical simulation engines pfaff2020learning; sanchez2020learning. The state of a physical system can be expressed using a particle-based method as a reduced order model. The particles are then expressed as nodes in a graph and the message passing neural network learns to compute the dynamics of the particlessanchez2020learning. In addition to particle-based methods, mesh-based methods have been shown to be successful in physical simulations pfaff2020learning. Such graph-based models, first encodes the input data into a latent space and then process it in the latent space (reduced model), and to obtain the physical results decode the data back to the physical space. Here, we first show why graph neural networks can generalize to learn fast PDE solvers inspired by finite difference schemes. We introduce domain invariant features and boundary conditions inspired by classical PDE solvers to improve the generalization of the learned PDE solver operator. With the introduced features, we show that message passing graph neural network architecture efficiently fits the classical PDE solvers and can learn time-stepping/solver operators for linear and nonlinear PDEs with different boundary conditions. We further demonstrate that our trained graph neural network solver can be generalized to solve PDEs on physical domains different from the domain that it is trained on. This is beneficial to train GNN on a sample of small domains for even with unknown dynamics, and further, explore the dynamic behavior on different larger physical domains. Lastly, we show that a recurrent version of our MPGNN can be used to predict the temporal sequence of solutions to a PDE.

## 3 Time-dependent PDEs

We consider continuous dynamical system evolving over time and spatial coordinate where is a bounded -dimensional domain. We assume the system is governed by a partial differential equation of the form

(1) |

where

denotes the linear/nonlinear differential operator(s) parameterized by the vector

. In the above general form, the temporal evolution of the current state depends on the differential operator which may include various spatial derivatives of the state including , etc. Depending on the differential operator , appropriate boundary conditions on is required for a well-posed PDE with a unique solution. Such PDE model is the cornerstone of mathematical models and is widely used to model various systems, from fluid dynamics, thermal sciences, to acoustics, and quantum mechanics. As an example, constitutes a convection diffusion equation for as variable of interest, where are the diffusitivity and the velocity field vector with which the quantity is moving with.The state of the system at each time can be obtained using its initial state and time integration as . Numerous numerical techniques such as Finite Elements, Spectral Methods, Finite Difference, or Finite Volume techniques have been developed to efficiently approximate the differential operator and solve for a dynamical system over time. In all numerical schemes, the domain is first discretized with a mesh, the differential operator is approximated locally using neighboring points, and the solution is calculated over small time steps using a time integrator such as Euler’s scheme, i.e.,

(2) |

where the superscript shows the solution over discretised time , and the differential operator shows that it contains information about local spatial derivatives. As an example, consider solving heat equation, , where is the diffusion constant and , on an structured grid shown in Fig. 1a. Let be the discretized solution at time and spatial location and where , and are the time, horizontal, and vertical spatial discretization respectively. The time and spatial derivatives in the heat equation can be expanded using Taylor series at each discretized point, where , , and etc. Re-writing the equation for an arbitrary discretized point, we find where , where and . Solving for this equation for all the points along with the boundary conditions the updates for the discretized points can be achieved. Note that here the update rule can be seen as the summation of updates that only depend on neighboring points. Although in this example with a simple linear equation and a structured grid it was easy to find the update rule , given an arbitrary domain that requires a triangular mesh for discretization (see Fig. 1b) and a nonlinear governing equation the update rule is not straight forward to be worked out. Our objective here is to learn the approximation of the differential operator using a graph representation of the domain with message passing neural networks. Since in a general PDE the differential operator contains local spatial derivatives and only local neighboring values at a point are relevant to approximate the differential operator at that point. As a result, a graph neural network is a promising framework to approximate the above right-hand side for the next value predictions.

## 4 Contributions

In this paper, we propose a graph-based model to learn domain-invariant and free-form solvers for a PDE on arbitrary spatial discretization using message passing neural networks. Our method here is inspired by the finite difference method and the possibility of approximating a partial differential equation using discrete point stencils. Here, we use graph neural networks as a nonlinear function approximator and use the simulated data to learn the required stable stencils. In order to show that that the graph neural network has learned the correct PDE solvers, we test our learned graph network to find PDE solutions in a domain with a different geometry and mesh discretization. We find that the trained model on a sample domain can be used to predict the PDE solution on other physical domains and mesh discretization. This is only possible by creating relevant features inspired by classical PDE solver techniques to represent the differential operator. Our contributions are: (i) introducing locally invariant feature representation inspired by classical PDE solvers to efficiently learn a differential operator solver; (ii) showing graph representation with message passing neural networks that can parametrize PDE solvers with a domain-invariant representation; (iii) obtaining robust high-performance models for various linear/nonlinear PDEs with arbitrary spatial discretization and showing that it can generalize to arbitrary physical domains, (iv) proposing a recurrent message passing graph neural network approach to predict a temporal sequence of PDE solution over time.

## 5 Graph Neural Networks for PDEs

Let be graph representation of a mesh with nodes where denotes the positions, and edges where represents the connecting neighboring points at and . Given a physical domain, we use a uniform random distribution of points for the nodes, and use Delaunay triangulation to find the neighboring points. We denote neighbors of node with . We further assign the node and edge attributes with and respectively (see Fig. 2).

In a message-passing neural network gilmer2017neural, we propagate the latent state of nodes of for layers, where at each layer , we have

(3) |

where and are differentiable deep neural networks. Note that represents the number of neighbors for node , and furthermore instead of the average used in equation 3, other permutation invariant aggregation function such as sum or max can also be used. Since equation (3) is an approximation of equation (2) where includes various spatial differentials, we take our edge features to include for , and also which is the PDE parameters at the midpoint of the edge . We further can include, higher derivatives of the PDE parameters such as for a better approximation representation. At each point in time, we set as the last snapshots for the initial latent space and use it to create the desired as the desired last frames of the solution. The main assumption here is that the derivatives can be approximated using the graph nodes, which is possible in most physical simulations where the solutions are smooth. Additionally, note that the predicted values here include all inside and boundary nodes.

We use multilayer perceptron for the

and with three hidden layers. Note that three hidden layers, allows each point to connect to the third order neighbors (i.e., neighbors of neighbors of neighbors), which potentially only allows for a maximum 6order derivative estimation. In general, the node features consist of the solution in previous time steps and edge features, motivated by the logic of solving PDEs, are made up of the distance between connecting neighboring nodes, along with the PDE parameters calculated at the center of the edge. We also add an extra feature to the node attributes showing that if the node lies on the boundary or not. This extra feature is 1 if the node lies on the boundary and 0 otherwise. Our decision on the node and edge features might slightly differ for different equations, and we point out if there is any change in the features set.

## 6 Test cases

In this section, we go through different linear/nonlinear PDEs for physical systems and evaluate the performance of our modeling. First, we start with the time-dependent heat equation and show how our model can learn to predict the future state(s) (see section 6.1), and more importantly, the learned model can be used for predictions in new physical domains different from initial learned domain (see figure 4). To predict the sequence of temporal data of a PDE, we use a recurrent message passing graph neural network approach and show that our model is able to predict the sequence of PDE temporal data (see figure 6). Next, to show that the model is able to learn the nonlinearities, we focus on the Navier-Stokes equation to predict future solutions with an arbitrary discretization (see section 6.2). Lastly, to show that the model is able to include PDE parameters, we focus on the advection-diffusion equation and by including the PDE parameters as edge attributes, we show that the model is able to learn to predict for various PDE parameters (see section 6.3).

### 6.1 Heat Equation

Time-dependent heat equation in two dimensions can be written as

(4) |

which can be solved with an initial condition and the Dirichlet boundary condition, i.e., , where denotes domain’s boundary. We chose a square grid , and use Firedrake Rathgeber2016 with a characteristic length of for a triangular mesh and time-stepping of 8e-4. Note that the mesh in the simulations are generated using built-in mesh generator in Firedrake

. In order to sample the data for training the MPGNN, we construct a Delaunay triangulation of uniformly distributed nodes (Poisson point process) in the domain. We chose different Dirichlet boundary conditions where we set top, left, right, and bottom boundary conditions to different constant values of

. Note that for different simulation also remains in the same range as . For each simulation, we use a newly generated mesh and set of boundary conditions. We set record the data every , and use four subsequent observations for the input data (i.e., ), and predict the next frame (i.e., ). As a result, we input frames, and predict the frame . Since per each simulation, we can further use the initial frame to be any of to frames, we can generate 20 data inputs per simulation.We generate 1000 simulations with different meshes and boundary conditions, and with that, we create 20,000 input data.We define a MSE loss between the network output and the true values as , where , are the network prediction and ground truth for the node value and is the number of nodes. We further use three layers for the message passing graph neural network with nodes, three message passing layers , and choose two three-layer neural networks for and where the hidden layers are of the size , and respectively. We set the learning rate to using ADAM optimizer and a step learning scheduler of after every epochs. Note that the network architecture and hyperparameters remains the same for different simulations, unless mentioned otherwise. We find that the MSE error starts at and reduces to after epochs. The relative on the test data after training falls below . The results for the prediction and ground truth for different boundary conditions on test simulations that the network has never seen are shown in figure 3. We further use the same generated data, however with more number of nodes (changing from 64 to 128) in the graph neural network and find that the error on the MPGNN on the test data does not change (mean of relative error on the test data changes from to ).

Next, to show that the learned MPGNN can generalize to different physical domains, we create a distorted physical domain with a curved boundary, two inclined edges, and a vertical wall. The newly generated geometry is different from the initial square geometry and the network has never seen such domains. We introduce a random mesh on this new physical domain with a similar characteristic mesh size and use the MPGNN (that is only trained on the square domain) to predict the results. We find that the MPGNN indeed predicts the output with high accuracy (average MSE loss of ). Examples of different simulations with different boundary conditions are shown in figure 4. This shows that the graph neural network here learns to predict the future values independent of the initial physical geometry and can be extended to predict a PDE results on new unseen physical domains. Here we have used four previous snapshots as the input to our network. In order to show the effect of number of input frames on the prediction power of our network, we run tests by changing the number of input frames and calculating the final average MSE error on the test data set on both similar and different geometry. In all the cases we predict the frame at 80 after the final frame. When we change the number of input frames, the input frames are respectively , , , , and apart for two, three, four, five, and eight input frames respectively. Interestingly, we find that two input frames are enough for prediction on similar geometries, however to have a reliable MSE on a different geometry we need at least three to four frames.

Lastly, we show that the MPGNN is capable of predicting sequence of temporal data, we use our MPGNN in a recurrent format (see figure 6a). We first train our MPGNN with a sequence of three past results (i.e, ) to predict the next frame (i.e., ). As a result, in the trained network, given input data of initial three frames of a simulation , it can predict the next time step solution . Next, we concatenate the last three obtained results, i.e. to predict the next time frame and so on. The results for a sample test case is shown in figure 6

b. It is to be noted that we did not use a recurrent loss function to train our network, and as a result, the average MSE loss keeps growing with the number of predicted frames. The average MSE error for 1000 different simulations along with the range of error are shown in figure

6c where the error slowly increases with the number of predicted frames. It is expected that training the MPGNN with a recurrent loss function would improve the overall result.### 6.2 Navier-Stokes Equation

In order to find MPGNN’s performance on learning nonlinear PDEs, we use a two-dimensional incompressible Navier-Stokes equation as

(5) |

where is the velocity field, is pressure, is the fluid’s density, and is the diffusion constant. The second equation is known as the incompressibility equation which assures a divergence-free velocity field (i.e., ). We use spectral methods to solve the above equation on a square domain with periodic boundary conditions (see supplementary material for the details). We chose a square domain with and input a random initial condition to generate the initial conditions we sample random numbers uniformly in the range

for the nodes in the graph and then we then remove the high frequency patterns by taking a Fourier transformation and discarding the top 1/3 of high frequency wave numbers. Furthermore, in this setting our

and 3e-4. and run our numerical PDE solver for and record frames with . Note that for different simulation also remains in the same range as . We chose such that the results become visually different. We first input five different frames (i.e., ) and predict the next frame (i.e., ). We generate data points from simulations similar to the heat equation data generation. In this case, we use the same network geometry for the message passing neural network as the one used for the heat equation. We find the relative error for the test data after epochs drops from to below . The results for three different test cases are shown in figure 7. Note that in here in the plots, we are presenting the vorticity field instead of velocity fields separately. The result here shows that the nonlinear PDE structure can also be learned using the MPGNN architecture.### 6.3 Advection-Diffusion Equation

In all the PDEs we tried so far, we kept the PDE parameters as constant. In this section, we focus on the advection-diffusion equation where the PDE includes different free parameters that can be used as edge features in the MPGNN. The advection-diffusion equation is

(6) |

where and are the advection velocity and diffusion constant respectively. We chose our domain to be and use characteristic mesh size of and time stepping of . We record the data every and use four subsequent input frames (i.e., ) and predict the next frame (i.e., ). We use Firedrake Rathgeber2016 to simulate our PDE with random initial meshes and initial conditions. We use periodic boundary conditions at and Neumann boundary conditions at . The initial condition is created using where all are drawn from a uniform distribution . Note that for different simulation also remains in the same range as . We select and randomly from uniform distributions . In total we generate simulations with random initial conditions and random PDE parameters and generate k input data for our network. We find that in training our network the average error for the test data (new simulations that network has never seen) drops from before training to after training. Three different test results are shown here in figure 8.

## 7 Benchmark tests

In this section we compare our approach with other existing relevant studies. We have identified two relevant neural network approaches closely related to our work: (1) iakovlev2020learning and (2) li2020multipole . Since we are interested in generalization of the network to other domains and mesh sizes, we perform two different tests on all of the networks and compare their performance by measuring average MSE for prediction on the test dataset. We select the heat equation as our basis (see Sec. 6.1) to test different approaches on a square geometry and test it on the unseen domain discussed in Sec. 6.1. We perform two set of tests: training on a square domain with a small characteristic mesh length (average edge size is which we call high res), and testing the performance on the same geometry, and on a different geometry (shown in Fig. 4) with similar resolution as well as a lower resolution (average edge distance of which we call low res). Next, we repeat the same experiment but with training on low resolution data. The results are summarized in Table 1. In repeating the work by li2020multipole, we used their provided code at the corresponding git repository and used their suggested hyper-parameters and made use of as node features, and the position of the nodes as edge features. As seen in table 1, the proposed network has a lower MSE best when it is applied to the same geometry, however, in generalization it performs poorly which is due to the positional feature of the nodes instead of the invariant edge distance. In following iakovlev2020learning

, similar to the previous case, we followed the proposed structure by the authors (i.e., three hidden layers with 60 neurons along with

activation function and one message passing layer). The main difference here is that the edge features contain relative positions as the edge features instead of absolute positions. This is indeed the main reason for a better generalization of this approach compared to the other method. Our method with a different activation function (ReLU instead of ) as well as three layers of message passing, along with extra feature flagging the boundary nodes shows an improved result.Trained on high res | Trained on low res | |||||
---|---|---|---|---|---|---|

MSE on test data | Our work | (1) | (2) | Our work | (1) | (2) |

same geometry (same res) | 4.01e-6 | 3.33e-5 | 1.49e-6 | 4.53e-6 | 6.27e-5 | 2.41e-6 |

different geometry (low res) | 7.98e-5 | 1.57e-4 | 9.87e-3 | 1.57e-4 | 1.63e-4 | 1.47e1 |

different geometry (high res) | 5.56e-5 | 9.86e-5 | 5.34e-3 | 8.06e-5 | 1.11e-4 | 5.26e0 |

## 8 Conclusion

In this paper, we showed how message passing neural networks can parametrize time-dependent partial differential equations and their discrete classical PDE solvers. Inspired by numerical PDE solvers, we introduce domain invariant features for an efficient representation of PDE data. Using graphs to represent an unstructured mesh, we trained message passing graph neural networks (MPGNN) to efficiently learn accurate solver schemes for linear/nonlinear PDEs independent of the initial trained geometry. We showed that MPGNN can predict single/multiple frame(s) of the future data and furthermore a trained model can predict results on unseen geometries. We further proposed a recurrent version of MPGNN to march in time and find a temporal sequence of solutions to a PDE. In summary, the main important features of a message passing graph neural network that make them suitable platforms for learning the time-dependent PDEs are: (i) MPGNNs similar to time-dependent PDEs are spatially locally dependent where each point is only affected by the neighboring points for a small time-step, (ii) the permutation invariant combination of the messages signifies the rotational and translational symmetries in a given physics-based PDE and helps the network to learn more efficiently, (iii) the neural networks used in creating the messages and the updates are general learners that can potentially learn the nonlinear update rules, and lastly (iv) the possibility of running several passes in message-passing neural networks helps the update of a node to see features of neighbors of neighbors and as a result MPGNN can find the update rule for larger time-steps. The possible future extension of our current MPGNN solver are: 1) including random long range connections in the initial mesh to enable long-time predictions with less number of message-passing layers; 2) training on a recurrent loss function and addressing significant memory required in backpropagation in order to improve long time predictions, ;3) using more accurate temporal schemes rather than Euler discretization to improve predictions, 4) including symbolic regression analysis to uncover the discretized kernel learned by the MPGNN and exploring the possibility of uncovering new PDE solver schemes.

## References

## Appendix A Spectral methods solution to Navier-Stokes Equation

Spectral methods solution to the Navier-Stokes equation. In this section, we review the spectral methods used to solve the 2d incompressible Navier-Stokes (NS) equation (5). In a 2D-fluid motion with , the NS equations (equation equation 6.2) in the expanded form are

(7) | |||

(8) | |||

(9) |

We define a stream function as , . The incomprehensibility condition, equation 9, is immediately satisfied. In order to remove the pressure from the equations, we (Eq. 8) - (Eq. 7) to find

(10) |

Note that this . We can simplify the last vorticity equation, equation 10, in terms of to find

(11) |

Next, we discretize the equation,

(12) | |||

(13) |

where we used the implicit Crank-Nicolson for the linear part and explicit part for the linear part. Assuming periodic boundary conditions in both and direction, and taking the Fourier transform of the above equation, we find

(14) | |||

(15) | |||

(16) |

where is the Fourier transform of the vorticity field, and where . We use equation 16 to propagate in time and to solve for the vorticity field. Given the vorticity field, we can further calculate the horizontal and vertical velocity with .

## Appendix B Periodic boundary condition

In order to impose the periodic boundary condition, we add connections to the boundary nodes that would result in a periodic boundary condition. Particularly, we consider similar copies of the nodes and by repeating the Delaunay triangulation for the nodes, we find how the boundary nodes are connected with the other the inside and boundary nodes (see Fig. 9).

Comments

There are no comments yet.