Anisotropic, Sparse and Interpretable Physics-Informed Neural Networks for PDEs

07/01/2022
by   Amuthan A. Ramabathiran, et al.
3

There has been a growing interest in the use of Deep Neural Networks (DNNs) to solve Partial Differential Equations (PDEs). Despite the promise that such approaches hold, there are various aspects where they could be improved. Two such shortcomings are (i) their computational inefficiency relative to classical numerical methods, and (ii) the non-interpretability of a trained DNN model. In this work we present ASPINN, an anisotropic extension of our earlier work called SPINN–Sparse, Physics-informed, and Interpretable Neural Networks–to solve PDEs that addresses both these issues. ASPINNs generalize radial basis function networks. We demonstrate using a variety of examples involving elliptic and hyperbolic PDEs that the special architecture we propose is more efficient than generic DNNs, while at the same time being directly interpretable. Further, they improve upon the SPINN models we proposed earlier in that fewer nodes are require to capture the solution using ASPINN than using SPINN, thanks to the anisotropy of the local zones of influence of each node. The interpretability of ASPINN translates to a ready visualization of their weights and biases, thereby yielding more insight into the nature of the trained model. This in turn provides a systematic procedure to improve the architecture based on the quality of the computed solution. ASPINNs thus serve as an effective bridge between classical numerical algorithms and modern DNN based methods to solve PDEs. In the process, we also streamline the training of ASPINNs into a form that is closer to that of supervised learning algorithms.

READ FULL TEXT VIEW PDF

page 6

page 8

page 9

page 11

page 12

02/25/2021

SPINN: Sparse, Physics-based, and Interpretable Neural Networks for PDEs

We introduce a class of Sparse, Physics-based, and Interpretable Neural ...
08/16/2021

A Physics Informed Neural Network Approach to Solution and Identification of Biharmonic Equations of Elasticity

We explore an application of the Physics Informed Neural Networks (PINNs...
05/17/2021

Physics-informed attention-based neural network for solving non-linear partial differential equations

Physics-Informed Neural Networks (PINNs) have enabled significant improv...
08/03/2022

Quantum-Inspired Tensor Neural Networks for Partial Differential Equations

Partial Differential Equations (PDEs) are used to model a variety of dyn...
09/15/2021

Spline-PINN: Approaching PDEs without Data using Fast, Physics-Informed Hermite-Spline CNNs

Partial Differential Equations (PDEs) are notoriously difficult to solve...
12/10/2019

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Motivated by the gap between theoretical optimal approximation rates of ...
04/06/2020

Gradient-Based Training and Pruning of Radial Basis Function Networks with an Application in Materials Physics

Many applications, especially in physics and other sciences, call for ea...

1 Introduction

Learning solutions of Partial Differential Equations (PDEs) using Deep Neural Networks (DNNs) has attracted significant attention over the past few years–see for instance [1], Physics-Informed Neural Networks (PINNs) [18] and the Deep Ritz method [5]

. The central idea in these methods is the direct use of either the PDE or its associated variational form to construct the loss function; the origin of this idea goes back to the seminal contributions in

[10]. However, these methods remain computationally inefficient in comparison to traditional algorithms like the finite element method. Another drawback of this approach which is not discussed as often is the fact that the trained DNN model is not interpretable in the same sense as traditional numerical methods. For instance, the parameters of a finite element approximation can be directly interpreted as the value of the field variables at specified nodal locations in the domain; this intepretation then permits the systematic development of various refinement strategies that improve the approximation in critical regions where the local errors are high. A corresponding strategy does not exist in the context of DNNs on account of the fact that the weights and biases of the DNN have no simple interpretation in terms of the value of the field variable that is approximated. It is therefore pertinent to develop interpretable DNN models that permit the development of more efficient algorithms. In this contribution, we build upon our earlier work [19] on SPINN (Sparse, Physics-informed, and Interpretable Neural Networks) and propose special DNN architectures that are sparse and interpretable. Crucially, we discuss how the specific notion of interpretability that is afforded by these specially crafted DNNs naturally leads to more efficient algorithms. Specifically, we develop an extension of SPINN where the local zone of influence of each node is anisotropic; this leads to a more efficient utilization of the nodes and a better resolution of various local features of the solution. This also provides a natural strategy to choose the architecture of these special networks, in sharp contrast to generic DNN based methods where the choice of network architecture is largely ad-hoc.

The larger idea underlying our approach is the construction of DNNs with special architectures that are interpretable. The use of DNNs with special architecture has been in vogue for a few decades in the approximation theoretic literature on DNNs; examples of this include the pioneering contributions in [12] and more recent works like [24, 13]. A particularly noteworthy work in this regard is [6]

where the authors illustrate the relation between piecewise linear finite element approximation and deep ReLU networks by giving an explicit construction of the ReLU DNN corresponding to a given piecewise linear finite element discretization. Though the authors in

[6] don’t view their work through the lens of interpretability, the sparse DNN that they construct is an example of an interpretable neural network whose weights and biases are directly related to the unknown quantities in the finite element discretization. In our earlier work on SPINN [19], we built upon this idea and introduced a class of sparse neural networks that generalize classical Radial Basis Function (RBF) networks [3] for solving PDEs. In particular, we demonstrated that certain kinds of classical meshless approximations are exactly representable as particular sparse DNNs. We present here an extension of our previous work by introducing a generalization of SPINN architectures–which we called ASPINN (Anisotropic SPINN)–that permits locally anisotropic approximations of the solution which are more efficient than using regular RBFs.

We remark here that the reinterpretation of an RBF ansatz as a single layer neural network is quite old; see for instance [2] for an early discussion of this connection. Further, the class of RBF networks have good universal approximation properties [14]. We note in passing that SPINN architecture that we introduced in [19] generalizes RBF networks and includes architectures that cannot be reduced to an RBF too. The extension of SPINN that we present inherits all the properties of SPINN and includes more efficient architectures.

It bears emphasis that the development of the proposed architecture is a natural outcome of the interpretability of the SPINN architecture. As elaborated in [19], SPINN models can be viewed as particles associated with a spherical zone of influence. This interpretation naturally suggests the use of particles with anisotropic zones of influence that are optimally designed based on the local solution landscape. This simple idea that underlies the ASPINN architecture will be more precise in the next section.

The outline of the paper is as follows. We introduce the structure of the ASPINN architecture and details of implementation in the next section. We then present a variety of results involving elliptic and hyperbolic PDEs that illustrate the utility of ASPINN, and finally conclude with a discussion of the merits and shortcomings of ASPINN.

2 Methodology

We present the key ideas underlying ASPINN by focusing on a (linear/nonlinear) PDE of the form

(1)

In (1), is a differential operator, is the field variable of interest, and and are given data on the interior and the boundary, respectively. The presentation given here can be extended to time dependent PDEs in a straightforward manner by treating space and time together, as will be illustrated later.

Among the many classical methods that have been developed to solve PDEs like (1), meshless methods (see [11] for example) approximate the unknown solution using an ansatz of the form

(2)

where is a kernel function, are node locations, measure the size of the (spherical) zone of influence of each node, and are the nodal weights. This ansatz has a very simple interpretation: the unknown field is obtained as a weighted linear combination of shifted and scaled version of the kernel . As elaborated in our earlier work [19], the ansatz (2) can be exactly represented as a sparse DNN. In short, the ansatz (2) can equivalently be viewed as a radial basis function network where the input

is first transformed using a mesh encoding layer to the feature vectors

, transformed subsequently by a kernel layer, and finally linearly combined to exactly represent (2).

The starting point of the current work is an anisotropic generalization of the ansatz (2) of the form

(3)

In (3), the matrices are symmetric and positive definite; they represent the local anisotropy of the zone of influence of each node. For the special case when , we recover the isotropic ansatz (2). The anisotropic ansatz (3) can be expressed as a sparse DNN in a straightforward manner: the input is first transformed to a mesh encoding layer using weights

and bias vector

–this is called the mesh encoding layer–and then passed into a kernel layer which generalizes the kernel . The output of the kernel layer is linearly combined to get . It is evident that weights and biases of this network directly correspond to terms in the ansatz (3). These networks can further be generalized to include non-RBF kernels too, as developed in detail in our earlier work [19]. We call this sparse architecture as Anisotropic SPINN, or ASPINN, for short.

The training of the ASPINN model is carried out by minimizing a loss function that is chosen directly in terms of the PDE (1), as was originally proposed in [10]. For instance, the integral of the squared residue of the PDE is a suitable choice for the loss function:

(4)

The boundary conditions are introduced using a simple penalty approach directly in the loss function (4), but other approaches are possible–see for instance [1] and [23] more interesting alternatives to incorporating the boundary conditions.

In practice, a discretized version of the loss (4) is used:

(5)

In (4), the points and denote sampling points in the domain and on the boundary , respectively. These can be viewed as collocation points where the loss functional (4) is evaluated.

We remark that we could equivalently use other loss functions; for instance, the variational integral that is used in the Deep Ritz method [5] is also a suitable choice for the loss function.

The optimization of the loss function is carried out using a suitable variant of the Stochastic Gradient Descent (SGD) algorithm. In this work, we use the Adam optimization algorithm

[9] for training ASPINN.

The training method presented above is quite standard, but we wish to emphasize two particular features of the approach we use to train ASPINN:

  1. In typical applications of PINN to solve PDEs, the loss (5) is minimized to train the network, but there is no independent testing of the trained model. To make the training process more streamlined with the standard training-testing procedure in supervised learning, we identify separate sets of interior testing points and boundary testing points , in addition to the interior training points and the boundary training points . We train the networks on the training samples and test them on the test samples to get a sense of the generalization capabilities of the trained model. We note that this approach is applicable for PINNs too.

  2. A naive implementation of the optimization process outlined above does not work for ASPINN since a gradient descent update does not guarantee the symmetry and postive definiteness of the matrices . To enforce this (nonlinear) constraint, we use (a slight modification of) the log-Cholesky parametrization of symmetric and positive definite matrices [17]. Specifically, we choose as training parameters the lower triangular matrices ; the entries of are denoted as . The matrix is then constructed as

    (6)

    In (6), is a scale parameter that can be used to control the growth of the diagonal entries of during optimization. The use of the log-Cholesky parametrization (6) yields an unconstrained optimization problem for the matrices , and is compatible with gradient descent updates. We remark that even a simple Cholesky decomposition would work here, but it suffers from degeneracy due to the lack of constraint on the diagonal entries of being positive; we also found that the Cholesky decomposition doesn’t perform as well as the log-Cholesky decompoistion in practice.

  3. Once an ASPINN model is trained, the zone of influence of each node is obtained by computing the eigenvalues of the matrices

    . For each node , the zone of influence is an ellipsoid; the eigenvalues of the corresponding dictate the dimensions of the ellipsoid. For vanilla SPINN, the zone of influence is a sphere, which is obtained as a special case of this construction.

3 Results

The application of ASPINN to solve various PDEs is now presented to highlight its strengths in comparison to both SPINN and PINN. The ASPINN algorithm was implemented in PyTorch

[15]. The simulations were automated using Automan [20]. The visualizations were prepared using Mayavi [21] and Matplotlib [7]. For all the simulations, the Adam optimizer [9] was used. The scale factor the log-Cholesky parametrization was chosen as . For all the simulations, the nodes are initially chosen to have an isotropic zone of influence. The parameter in the loss function (5) is chosen to be larger than . This choice is made to ensure that the boundary conditions are satisfied to a reasonable degree of accuracy. The nodes are initially placed uniformly over the domain in all the simulations presented below.

3.1 2D Poisson equation

We begin with the Poisson equation in two dimensions:

(7)

The PDE (7) admits the exact solution . We show a plot of the solution obtained using ASPINN in comparison with the exact solution in Figure 1. The location of the centers along with the zones of influence of each node–ellipses in the 2D case–are shown in Figure 2. The ASPINN model had 4 nodes along the X direction and 2 along the Y direction, comprising a total of 8 nodes. Around interior sampling points were chosen to evaluate the loss. The choice of the number of nodes was dictated by the nature of the solution–as can be seen from Figure 1, the exact solution has 8 peak; natural choice of the ASPINN architecture is one where an intially isotropic node is placed uniformly over the domain close to the location of each peak. As is apparent from Figure 2, the nodes towards their respective peaks, and their zones of influence become appropriately anisotropics so as to best captures the solution locally. For instance, it can be seen that the exact solution has greater variation along the Y direction than the X direction; the zones of influence of the itnerior nodes accordinly are elongated along the Y direction and contracted along the X direction. The purpose of this simple example is to illustrate the fact that the direct translation of the weights and biases of the ASPINN model into corresponding interpretable quantities like the anisotropy of the zones of influence and the location of the nodes permit us to reason both about the architecture and the quality of the solution. To understand the latter, we recall that SGD algorithms often find a local minimum in the loss landscape and it is not often clear whether the local minimum that is obtained is a good one. In this case however, the reasoning just oulined informs us that the weights and biases are in the right local minimum since the corresponding nodes and zones of influence behave as per our expectations of their optimal values ought to be. It bears emphasis that such an analysis about the weights and biases of generic DNNs are not feasible in general. The ability to directly interpret the trained parameters of the ASPINN model in this fashion thus permits us to reason about various architectural choices and quality of the local minima of the trained solution.

A comparison of the errors of the computed solution using PINN, SPINN and ASPINN is shown in Figure 3. The PINN architecture was chosen to have a comparable number of neurons with respect to the A/SPINN models. The results clearly indicate the superior performance of ASPINN in comparison to both SPINN and PINN. Stated differently, it takes much smaller number of iterations for the ASPINN model to reach a desired level of accuracy compared to PINN. Since the order of accuracy increases with increasing the number of nodes of the ASPINN modle, this also implies that fewer nodes are required to learn a solution of comparable accuracy to a given SPINN or PINN model. This observation will be borne out by the various simulations presented in the sequel.

The effect of batch size on error obtained using ASPINN is shown in Figure 4. The numerical evidence indicates that smaller batch sizes are in general better for obtaining faster convergence. This observation holds in general when solving other PDEs too.

Figure 1: Solution of (7) using ASPINN. The exact solution is shown as a wireframe plot.
Figure 2: Location of nodes and their anisotropic zones of influence for the ASPINN solution of (7).
Figure 3: error during SGD training for the PDE (7).
Figure 4: Effect of batch size on error for ASPINN. The values indicated in the label refer to the fraction of the sampling points that is used for each batch.

3.2 Static ripple in 2D

As our next example, we choose the PDE

(8)

The PDE (8) admits an exact solution . The solution looks like a static ripple over the domain . This particular PDE is chosen to illustrate the performance of ASPINN in the presence of large gradients distributed spatially over the domain.

An example of a trained ASPINN model is shown in Figure 5. The figure clearly illustrates how the nodes adapt their location and orientation during the learning process to capture the various gradients in the exact solution. It is worth emphasizing that this kind of information that directly relates the weights of the trained model to physically interpretable parameters is not possible with the use of generic DNNs as in the case of PINNs.

As mentioned in the previous example, another advantage in having a visual representation of the weights as shown in Figure 5 is that it helps us evaluate the quality of the local minimum that the algorithm lands in. The fact that the centers are not in symmetric positions indicate that the local optimum is not a global optimum; on the other hand, the alignment of the nodes along the solution gradients indicate that the local minimum that the optimizer steers the model to is indeed a good one.

The training loss for PINN, SPINN and ASPINN is shown in Figure 7. This example clearly indicates the advantage of using ASPINN–in addition to being interpretable, it is much more efficient in computing an approximate solution to the PDE.

The effect of batch size on the convergence of the ASPINN model is shown in Figure 6. As noted earlier, the numerical simulations suggest that smaller batch sizes work better.

Figure 5: Location of nodes and their anisotropic zones of influence for the ASPINN solution of (8).
Figure 6: Effect of batch size on error for ASPINN. The values indicated in the label refer to the fraction of the sampling points that is used for each batch.
Figure 7: Training loss of PINN, SPINN and ASPINN for the PDE (8). For SPINN and ASPINN, 64 nodes and 1600 sampling points were used. The PINN model was chosen as a 2 layer tanh-network with 100 neurons in each layer. The batch size was chosen as 200.

3.3 Square slit

We consider next a problem involving a non-convex domain, namely the Poisson equation on a square with a slit:

(9)

This equation does not admit an exact solution, but can be solved using any standard numerical method, like the finite element method for instance.

The solution of the PDE (9) using ASPINN is shown in Figure 8. The visualization of the zones of influence illustrates the abiility of the SPINN algorithm to adapt to the local solution landscape.

A comparison of the training loss for PINN, SPINN and ASPINN for this PDE is shown in Figure 9. It can be seen that ASPINN outperforms both SPINN and PINN in this case. It is to be remarked that this specific example was chosen since both PINN and SPINN exhibit poor solution convergence. Though the accuracy of the ASPINN model is not high–the error computed with reference to a finite element solution is of the order of and plateaus there–the speed of convergence of the ASPINN algorithm is much faster than PINN and SPINN.

Figure 8: Solution of Poisson equation over a square domain with a slit (9) using spacetime ASPINN. The error of the converged solution with respect to a reference finite element solution is .
Figure 9: Training loss of PINN, SPINN and ASPINN for the PDE (9). For SPINN and ASPINN, 49 nodes and 1200 sampling points were used. The PINN model was chosen as a 2 layer tanh-network with 100 neurons in each layer. The batch size was chosen as 32.

3.4 1D linear advection equation

We now turn our attention to time dependent PDEs. We focus on hyperbolic PDEs since the presence of a propagating wave yields sharp gradients in the spacetime solution, thereby serving as good tests of the ASPINN models.

The linear advection equation takes the form

(10)

The linear advection equation (10) admits traveling wave solutions of the form . We choose in particular a Gaussian initial profile of the form with and .

For ease of computing the solution, we focuson the interval and simulate the solution over the interval , with chosen such that the wave does not exit the interval during this time period. The solution obtained using ASPINN is shown in Figure 10. The solution was computed by simultaneously discretizing space and time; we call this spacetime ASPINN. The location of the corresponding nodes is shown in Figure 11. The orientation of the nodes along with their zones of influence along the propagating wave gives a good illustration of the effectiveness and interpretability of the ASPINN algorithm. This further suggests that for PDEs whose solutions have known structural properties, the weights and biases of ASPINN models can be chosen appropriately to accelerate convergence. For instance, a placement of initially isotropic nodes along the propagating wave would yield much faster convergence. Such a problem-specific initialization of the weights and biases is not possible in general for PINNs owing to their non-interpretability.

Finally, various time snapshots of the spacetime ASPINN solution were extracted and explicitly compared with the exact solution, as shown in Figure 12. It can be seen that ASPINN captures the traveling wave profile without any numerical diffusion, thereby proving the effectivness of the ASPINN algorithm for studying hyperbolic PDEs.

Figure 10: Solution of advection equation (10) using spacetime ASPINN.
Figure 11: Location of nodes for spacetime ASPINN solution of the advection equation (10).
Figure 12: A comparison of the spacetime solution of the linear advection equation (10) computed using ASPINN and the exact travelling wave solution with an initial Gaussian profile.

3.5 1D Burgers equation

As a final example we discuss the inviscid Burgers equation:

(11)

The reason for choosing the inviscid Burgers equation is that the solution develops a shock after a finite time. In traditional numerical treatments of the Burgers equation, special care has to be taken to capture the shock accurately.

We adopt a spacetime discretization to solve the Burgers equation, just as in the case of linear advection. The spacetime solution computed using ASPINN is shown in Figure 13. The location of the nodes and their zones of influence is shown in Figure 14. It can be seen clearly from Figure 14 that the zones of influence follow the shock and get more elongated along the time direction as they get closer to the shock, which is what would be expected from an adaptive algorithm. It bears emphasis that it is the interpretability of the ASPINN model that permits us to reason about the quality of the computed solution.

As in the case of linear advection, various time snapshots of the spacetime ASPINN and a reference solution computed using PyClaw [8] are shown in Figure 15. While the ASPINN solution doesn’t capture the shock exactly, the numerical solution obtained agrees reasonably with the expected results. In particular, the qualitative details of shock formation are captured by the ASPINN model. Our prior experience [19] in solving the Burgers equation using SPINN suggest that a hybrid finite difference and ASPINN approach would yield better resutls; this will be explored in a future work.

Figure 13: Solution of Burgers equation (11) using spacetime ASPINN. The ASPINN model has 80 interior nodes, 600 interior samples, and batch size 100.
Figure 14: Location of nodes for spacetime ASPINN solution of the Burgers equation (11).
Figure 15: A comparison of the spacetime solution of the Burgers equation (11) computed using ASPINN and a reference solution computed using PyClaw [8].

4 Discussion

The examples presented thus far illustrate the effectiveness of the anisotropic extension of SPINN that we propose in this work. The results clearly indicate that ASPINN provides a more efficient and interpretable alternative to PINN. We would now like to discuss a few related issues and point out potential limitations of ASPINN.

  1. The sense in which ASPINN is interpretable requires clarification. The notion of intepretability is interpreted differently by different researchers. In the context of ASPINN, we refer to interpretability in the sense of understanding the computational graph. This is quite distinct from the emphasis in fields like interpretable/explainable AI–see for instance Locally Interpretable Model-Agnostic Explanations (LIME) [22] where the goal is to find a local fit using an interpretable model to the predictions of a generic DNN model. In contrast, in ASPINN, we design a network with a special architecture such that the weights and biases of this network are interpretable in a physically relevant sense.

  2. The interpretability of the weights and biases of an ASPINN model yields useful insights into the refinement of the architecture. For instance, a large error in a particular region immediately suggests adding more nodes and/or sampling points in that region–this translates to an easily implementable change in the architecture. In contrast, when using PINNs, the choice of architecture is largely ad-hoc. Furthermore, there is no systematic means to get a better architecture if the solution displays high local errors.

  3. The approach to creating interpretable models that we adopt here is that of designing special architectures. This, however, does not address the issue of the interpretability of generic DNN models. As remarked in [19], the best we can do is to view a DNN as a (global) Ritz approximation; the DNN model learns global basis functions which are then linearly combined. But such an interpretation lacks the sharpness of interpretations of special architectures like SPINN or the ReLU-FEM networks developed in [6]. The development of methods to interpret generic DNN models remains an open challenge.

  4. In addition to being interpretable, we emphasize that special architectures like the ones we propose here overcome some of the disadvantages of using a dense DNN to represent the solution. The sparsity of the architecture often translates to a corresponding increase in computational efficiency, though these too remain inefficient in comparison to traditional algorithms like the finite element method. The proposed method thus lies in the spectrum between traditional solvers and DNN based methods, thereby acting as a bridge between different modeling viewpoints.

  5. ASPINN can equivalently be viewed as a generalization of classical meshless methods in a manner that renders them differentiable end-to-end. This allows for the use of ASPINN models in conjunction with a larger PDE constrained optimization problem–we will be exploring this in a future work.

  6. A fundamental limitation associated with methods like ASPINN is an exact handling of boundary conditions. The present article adopts the simplest means to implement boundary conditions, namely by penalizing deviations from the prescribed boundary conditions. While this is certainly the simplest, it is not the best possile option. A variety of alternatives like the ones discussed in [1, 23] can be adopted in conjunction with ASPINN.

  7. While the examples presented here are limited to simple domains, the method presented is general enough to handle complex domains too. A few such examples were presented in our earlier work [19]; since ASPINN follows the larger structure of SPINN, its extension to problems defined on complex boundaries is straightforward. In addition to this, hybrid methods like finite differencing in time in conjunction with ASPINN in space, which were explored in our earlier work with SPINN [19], are also easily implementable with ASPINN. We will be exploring this in a future work, especially in the context of the Burgers equation where capturing the shock is a non-trivial challenge for spacetime discretizations.

  8. The approach presented in this work streamlines the training and testing of PINN/SPINN/ASPINN models by choosing distinct sets of sampling points for training and testing. This is necessary to study the generalization capabilities of the trained models.

  9. An advantage that is often claimed for neural networks is their ability to overcome the curse of dimensionality. While this claim is not unconditionally true, a question that we haven’t discussed in this work is how well special architectures like SPINN/ASPINN fare when it comes to overcoming the curse of dimensionality. Using the arguments presented in works like

    [16, 14], SPINN/ASPINN are expected to be at least as good as RBFs in terms of their approximation capabilities. How well the more general ASPINN models overcome the curse of dimensionality remains to be explored.

  10. As a final remark, we note that a variety of extensions of ASPINN, like the use of a kernel network to model the kernel, the use of methods like Least Squares Gradient Descent [4] to accelerate the performance, etc. can be easily applied to SPINN too. These and similar extensions are planned for future works.

5 Conclusion

In this work we have presented ASPINN, a class of interpretable neural network models that are also efficient in comparison with both SPINN and PINN. ASPINNs generalize classical RBFs by peforming a local coordinate transformation for each node in a manner that best captures the solution at the location of the node. This yields a visually interpretable model where the weights and biases of the network carry a straightforward interpretation. Such an interpretation helps in developing a systematic means to modify the architecture; this is not possible with generic DNN models on account of the non-interpretability of their weights and biases. We presented a variety of results illustrating the efficiency and interpretability of ASPINN in the context of both elliptic and hyperbolic PDEs. The results indicate that ASPINN is in general more efficient than PINNs. Furthermore, the consistent relocation of the nodes to physically interesting regions of the solution and the reorientation of the zones of influence to align with the local solution landscape allow us to reason about the quality of the trained ASPINN model. ASPINN thus provides an efficient, interpretable and differentiable generalization of classical RBF meshless methods, thereby providing a link between classical and modern numerical methods to solve PDEs.

References

  • [1] Jens Berg and Kaj Nyström. A unified deep artificial neural network approach to partial differential equations in complex geometries. 317:28–41, 2018.
  • [2] D.S. Broomhead and D. Lowe.

    Multivariable functional interpolation and adaptive networks.

    Complex Systems, pages 321–355, 1988.
  • [3] M.D. Buhmann. Radial basis functions. Acta Numerica, pages 1–38, 2000.
  • [4] Eric C. Cyr, Mamikon A. Gulian, Ravi G. Patel, Mauro Perego, and Nathaniel A. Trask. Robust training and initialization of deep neural networks: An adaptive basis viewpoint. In Jianfeng Lu and Rachel Ward, editors, Proceedings of The First Mathematical and Scientific Machine Learning Conference, volume 107 of Proceedings of Machine Learning Research, pages 512–536, Princeton University, Princeton, NJ, USA, 20–24 Jul 2020. PMLR.
  • [5] Weinan E and Bing Yu.

    The deep ritz method: A deep learning-based numerical algorithm for solving variational problems.

    Commun. Math. Stat., 6:1–12, 2018.
  • [6] Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear finite elements. Journal of Computational Mathematics, 38(3):502–527, 2020.
  • [7] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.
  • [8] David I. Ketcheson, Kyle T. Mandli, Aron J. Ahmadia, Amal Alghamdi, Manuel Quezada de Luna, Matteo Parsani, Matthew G. Knepley, and Matthew Emmett. PyClaw: Accessible, Extensible, Scalable Tools for Wave Propagation Problems. SIAM Journal on Scientific Computing, 34(4):C210–C231, November 2012.
  • [9] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014. Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
  • [10] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.
  • [11] Shaofan Li and Wing Kam Liu. Meshfree and particle methods and their applications. Applied Mechanics Reviews, 55(1):1–34, January 2002. Publisher: American Society of Mechanical Engineers Digital Collection.
  • [12] H.N. Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math., 1:61–80, 1993.
  • [13] J. A. A. Opschoor, P. C. Petersen, and Ch. Schwab. Deep ReLU networks and high-order finite element methods. Technical Report 2019-07, Seminar for Applied Mathematics, ETH Zürich, Switzerland, 2019.
  • [14] J. Park and I.W. Sandberg. Universal approximation using radial-basis-function networks. Neural Computation, 3:246–257, 1991.
  • [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  • [16] P.C. Petersen. Neural network theory, 2022.
  • [17] J.C. Pinheiro and D.M. Bates.

    Unconstrained parametrizations for variance-covariance matrices.

    Statistics and Computing, 6:289–296, 1996.
  • [18] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • [19] Amuthan A. Ramabathiran and Prabhu Ramachandran. SPINN: Sparse, physics-based, and partially interpretable neural networks for PDEs. Journal of Computational Physics, 110600, 2021.
  • [20] Prabhu Ramachandran. automan: A python-based automation framework for numerical computing. Computing in Science & Engineering, 20(5):81–97, Sep./Oct. 2018.
  • [21] Prabhu Ramachandran and Gaël Varoquaux. Mayavi: 3D visualization of scientific data. Computing in Science and Engineering, 13(2):40–51, 2011.
  • [22] M.T. Ribiero, S. Singh, and C. Guestrin.

    ”why should I trust you?” Explaining the predictions of any classifier, 2016.

    KDD 16: Proceedings of the 22nd ACM SIGKDD Interational Conference on Konwledge Discovery and Data.
  • [23] S. Sukumar and A. Srivastava. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Computer Methods in Applied Mechanics and Engineering, 389:114333, 2022.
  • [24] D. Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.