A Shallow Ritz Method for elliptic problems with Singular Sources

07/26/2021 ∙ by Ming-Chih Lai, et al. ∙ 0

In this paper, a shallow Ritz-type neural network for solving elliptic problems with delta function singular sources on an interface is developed. There are three novel features in the present work; namely, (i) the delta function singularity is naturally removed, (ii) level set function is introduced as a feather input, (iii) it is completely shallow consisting of only one hidden layer. We first introduce the energy functional of the problem and then transform the contribution of singular sources to a regular surface integral along the interface. In such a way the delta function singularity can be naturally removed without the introduction of discrete delta function that is commonly used in traditional regularization methods such as the well-known immersed boundary method. The original problem is then reformulated as a minimization problem. We propose a shallow Ritz-type neural network with one hidden layer to approximate the global minimizer of the energy functional. As a result, the network is trained by minimizing the loss function that is a discrete version of the energy. In addition, we include the level set function of the interface as a feature input and find that it significantly improves the training efficiency and accuracy. We perform a series of numerical tests to demonstrate the accuracy of the present network as well as its capability for problems in irregular domains and in higher dimensions.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The fluid-structure interaction (FSI) problems have many applications in science and engineering. For example, the blood flow (fluid) in the heart valve leaflet (imbedded structure) peskin1977 is a typical FSI problem. This kind of problem often causes numerical simulation challenges due to the time-dependent and irregular shape of the embedded structure. In order to simulate such problems, Peskin peskin1977 proposed the so-called Immersed Boundary (IB) method which turns out to be popularly used due to its simplicity of implementation. In fact, the method does not require body-fitted discretizations for the structure which can save computational efforts significantly.

The immersed boundary method is in fact both a mathematical formulation and numerical method for fluid-structure interaction problems. In IB formulation the fluid variables are represented in Eulerian manner while the embedded structure is represented in Lagrangian one. The embedded structure is usually one-dimensional lower than the fluid dimensional space and it is regarded as a singular force generator. Thus, the governing equations consist of Navier-Stokes (or Stokes) equations with singular forces in the form of the Dirac delta function. In order to solve the problem numerically, a projection-type of Navier-Stokes solver (for instance, see an overview in GMS06 ) often involves solving elliptic equations with singular sources for the intermediate velocity. The IB numerical method then solves the equations by using finite difference discretization and a smooth version of the discrete delta function to regularize the singular sources. This method is easy to implement but leads to first-order accuracy. Another example is solving heat equations with singular sources. If time discretization is applied (for instance implicit Euler method), then at each time step, an elliptic equation with singular sources must be solved. So motivated by the above applications, we aim to solve elliptic problems with singular sources on the interface in this paper. Unlike the traditional numerical methods, our present method is based on a Ritz-type neural network with only one hidden layer. Since the neural network method is mesh-free, it can be applied to higher-dimensional problems in which the traditional methods are infeasible.

Until very recently, it begins to draw much attention to solve partial differential equations (PDEs) using deep neural network (DNN) in the scientific computing community. Part of the theoretical reason can be attributed to the various kinds of expressive power for function approximations using DNN such as those described in

Cybenko1989 ; HSW89 ; LPWHW17 ; HS18 , just to name a few. In terms of implementation, there are mainly two different approaches; namely, the physics-informed neural networks RPK19 , and the deep Ritz method EY18 . The main difference between the two approaches is how the loss is defined. The physics-informed neural networks is trained by minimizing the mean squared error loss of the equation residual, along with the initial and boundary condition errors. The deep Ritz method, however, begins with formulating the variational problem equivalent to the original PDE so the natural loss function in this framework is simply the energy. Both approaches share the same major mesh-free advantage and therefore can be practically applied to solve problems on complex geometry SY21 and in high-dimensional space HJE18 ; SS18 .

In this paper, we propose a new shallow Ritz method for solving elliptic problems with singular sources. The novelty of the proposed network is three-folds. Firstly, we remove the delta function singularity appearing in the original PDE by formulating to the variational problem. Secondly, the level set function, which is commonly used as an interface indicator, is included as a feature input that effectively improves the model efficiency and accuracy. Thirdly, we approximate the solution using a shallow neural network with only one hidden layer that significantly reduces the training cost in contrast to DNN. Furthermore, the mesh-free nature of neural networks makes it extremely easy to solve problems in more complex (or irregular) domains which the traditional methods, such as finite difference, are difficult to implement, especially in higher dimensions.

The rest of the paper is organized as follows. In Section 2, we show how to formulate a -dimensional elliptic problem with singular sources into an energy functional minimization problem. The shallow Ritz method to solve the problem is presented in Section 3, followed by a series of numerical accuracy tests and comparisons in Section 4. Some concluding remarks and future work are given in Section 5.

2 Elliptic problems with singular sources on the interface

We consider a -dimensional elliptic problem in a bounded domain divided by an embedded interface into two regions; namely, inside () and outside () of the interface, so that . The interface is represented by its parametric form as with some parametrization . The elliptic equation with singular sources on the interface is written as


where is a non-negative constant ( corresponds to the Poisson equation). Throughout this paper, we focus on the case of the Dirichlet boundary condition but other boundary conditions can be applied without changing the main ingredients of the proposed method. In above, is a given function, is the source density defined only on the interface, and is the -dimensional Dirac delta function. Since the second term of the righthand side of Eq. (1) has the integral over the dimensional interface while the delta function is dimensions, this leaves the term has a one-dimensional delta function singularity. As mentioned before, Eq. (1) appears in many realistic applications and in particular, the fluid velocity equations in the immersed boundary formulation for fluid-structure interaction problems Pes02 . The IB method uses finite difference discretization and a grid-based discrete delta function to approximate the integral term in Eq. (1). But this singular source regularization technique is only first-order accurate no matter what discrete delta functions are used due to the feature of delta function singularity. It was rigorously proved by Li Li15 that the IB method for Eqs. (1)-(2) in 2D is indeed first-order accurate.

It is known that the solution to the above elliptic equation with singular sources as shown in Eq. (1) is piecewise smooth across the interface. That is, the solution is continuous over the domain but has discontinuity in its normal derivative along the interface . In fact, Eq. (1) is equivalent to the following elliptic interface problem with the same boundary condition (2) as


where the notation represents the jump quantity (from the outside value to the inside value) across the interface. The above derivation can be found in some literatures such as the one in LI06 . One can immediately see that the delta function singularity no longer exists in Eq. (3) but instead of the form of normal derivative jump. So a natural way to solve Eq. (3

) is to take the derivative jump discontinuity into account and incorporate it into the finite difference discretization. A typically successful second-order accurate method is the immersed interface method (IIM) first proposed by LeVeque and Li 

LL94 in which they introduced a local coordinate system to derive the necessary derivative jump quantities. A simpler version of IIM using the normal jump directly was proposed in LT08

. However, the above immersed interface method becomes more difficult and tedious to implement when the space dimension increases. Recently, the authors of this paper propose a machine learning method called discontinuity capturing shallow neural network (DCSNN) 

HLL21 to solve more general elliptic interface problems than the one in Eq. (3) and successfully obtain the comparable results with IIM. Furthermore, the DCSNN method can be applied to higher dimensions where the traditional IIM is completely infeasible.

In this paper, we adopt different machine learning approach used in DCSNN where jump continuities must be incorporated into the loss function. The method is in the same spirit as the deep Ritz method EY18 by formulating the differential equation (1) in minimization problem so that the energy is the natural loss function of the neural network. In WZ20

a deep Ritz type approach to solve elliptic interface problem with high-contrast discontinuous coefficients is developed where a shallow neural network is used to approximate the boundary conditions and a deep neural network with ReLU (Rectified Linear Unit) activation function is employed. In 

GY21 a deep unfitted Nitsche method to solve elliptic interface problem with high contrasts in high dimensions is developed where two deep neural networks is formulated to represent two components of the solution. Another deep least squares method CCLL20 is proposed to minimize the first-order system least-squares functional which is rewritten from a second-order elliptic problem. Comparing with the previous work in WZ20 ; GY21 ; CCLL20 , the major difference is that our present network is completely shallow consisting of only one hidden layer. The details will be given in next section.

2.1 Variational problem

As mentioned earlier, the solution to the elliptic problem with singular sources on the interface, Eq. (1), is continuous but has discontinuous derivatives across the interface. Since the solution is not classically smooth, we are going to use the usual Sobolev spaces and to define the solution space. We assume the righthand side function and the source strength . In order to take the boundary condition (2) into account, followed the work in LS03 , we assume that the boundary data has a smooth extension to the bounded domain so that and . So the solution space can be defined as . We now formulate equation (1) into its weak form by simply multiplying the test function in Eq. (1). Using the Green’s first identity and the fact that test function vanishes at the boundary , we obtain the following weak formulation: Find such that


for all . Note that, the second term of the righthand side comes from the definition of Dirac delta function as


One can also derive the weak formulation by using the equivalent equation (3) with jump conditions in which the Green’s first identity is applied separately in and , and then summing up together to get Eq. (4). Note that, the boundary integral terms at the interface used in the above Green’s identities result the integral of interface normal derivative jump in Eq. (4) since . The existence and uniqueness of the above weak solution can be found in LS03 .

To adopt a Ritz-type neural network to solve the problem, we now rewrite the above weak formulation Eq. (4) to its equivalent minimization problem as follows. That is, find such that for all , where the energy functional reads


As a result, like the weak formulation, the delta function is completely removed from the energy and its contribution becomes a regular integral along the interface . Thus, we do not need to handle the singularity problem arising from the original equation (1).

2.2 Boundary condition enforcement

Although one hidden layer neural network can be a universal approximator (see a review in Pin99 ), the boundary condition requirement in still seems to be infeasible. Thus, we simply relax the boundary condition (2) by adding a penalty term to the energy (6) that reflects the penalty effect if the boundary condition is not satisfied exactly. The energy functional is therefore modified as


where is some positive penalty constant.

Now, it is interesting to see what the global minimizer will be for such a modified energy functional. To proceed, let us decompose a function as a sum of two functions by writing (the choice of and will be clear later). We have its energy

where the term can be rewritten, using the Green’s first identity under the smoothness assumption of in and (i.e. ), as

Therefore we obtain

That is, if we choose as the solution of Eq. (3) but with a modified Robin-type boundary condition


we should have


So it is clear that the solution of Eq. (3) (or equivalently Eq. (1)) with boundary condition (8) is the global minimizer of the energy functional . We thus formally write the solution to the variational problem as


where is the set of trial functions that does not require any constraint at the domain boundary. It is important to mention that as the penalty constant becomes larger, the above minimizer should approach to the solution of Eq. (3) with exact boundary condition (2). (In fact, as , Eq. (8) tends to be Eq. (2)).

2.3 Level set function augmentation

As can be seen from above, the minimizer of the energy functional (or the solution of Eq. (3)) is a continuous function but has a jump discontinuity at its normal derivative on the interface. Therefore, the appropriate trial functions must carry the same feature. We note that the construction of such a set of functions using neural net approximation requires additional efforts. Here, inspired by DCSNN HLL21 where an augmented variable is introduced to categorize precisely the variables into each subdomains, we also introduce an augmented variable and require the function to be continuous throughout the whole domain. More precisely, consider a level set function such that the position of the interface is given by the zero level set, i.e., . We define a function that satisfies


Here the level set function is considered as a feature input in the augmented variable. We require both the level set function and the extension function being continuous, so that is a continuous function. Although it looks like the derivative discontinuity on the interface is not taken into account, the augmented variable somehow gives a feature input to the function with additional information related to the interface. As we will see later in the numerical experiments, indeed the introduction of the augmented variable effectively improves the capability of neural network in function approximation.

2.4 Summary

To summarize, we solve elliptic problems with singular sources on the interface by looking for a continuous function of the form . The target function is found by minimizing the energy functional , where . In the following we shall develop a neural network architecture to represent and a loss function to be used for model training.

3 A shallow Ritz method

We propose a shallow neural network to approximate the -dimensional continuous solution for solving the minimization problem Eq. (10). Based on the universal approximation theory Cybenko1989 , we hereby design a shallow, feedforward, fully-connected neural network architecture, in which only one single hidden layer is employed. Let

be the number of neurons used in the hidden layer, the approximation function (or output layer) under this network structure is explicitly expressed by


where and being the weight matrices, and

being the bias vectors, and

is the activation function. One can easily see that the function is simply a linear combination of the activation functions and hence share exactly the same smoothness as

. By collecting all the training parameters (including all the weights and biases) in a vector

, the total number of parameters in the network (i.e., dimension of ) is counted by .

As shown in the previous section, the solution we are looking for is the global minimizer of the energy functional, it is therefore nature to consider the loss function in the training procedure using that energy. In this stage, we aim to learn the training parameters via minimizing the modified energy (7). The loss function in the training procedure is thus defined in the framework of deep Ritz method EY18 by replacing the integrals using discrete quadrature rules. That is, given training points in , along the embedding interface and on the domain boundary , denoted by , and , respectively, we hereby define the loss function as


where is the volume of in while and are volumes of and , respectively, in . We remind that the function and are linked through the relation . Here for brevity the energy is written in terms of , but it should be easy to rewrite it in terms of .

3.1 Selection of training points

As one can see, the quadrature rules used in the discrete loss error (13) is the Monte Carlo type, namely, the integrals are approximated by an average over the sampling. But in fact, as the energy is presented in the form of volume and surface integrals, one may expect a better performance if those integrations are evaluated using accurate numerical quadrature rules. So, if the considered problem is defined in two- or three-dimensions, the first intuitive way to select the training points might be to choose quadrature nodes based on, e.g., midpoint or Gaussian quadrature rules, just to name a few. (Note, if Gaussian nodes are chosen as the training points, the weights in the numerical quadratures for the loss function should be modified accordingly.) We have found that the training of the neural network indeed relies heavily on an “accurate” evaluation of the integrals. However, the question is not really which quadrature rule is chosen, but rather how accurate the numerical integration is.

In general if one is to evaluate an integral to a desired accuracy, there is no a priori knowledge on the required number of quadrature nodes. So a convergence test should be performed by, e.g., re-evaluating the integral with an increased number of nodes. On the other hand, as the network model changes dynamically during the training process, if the training points are fixed throughout the entire optimization process, it means that the energy is approximated using a fixed set of quadrature nodes, in fact, the accuracy of the numerical integration cannot be guaranteed.

To ensure an accurate evaluation of the energy, ideally we should check convergence of the numerical quadrature in each single step of training which is not feasible in practice. The convergence test might be relaxed a little by choosing just two set of nodes, training set and testing set, and monitor carefully during the training process the difference between training loss and testing loss. The training should be abandoned if the difference is greater than a prescribed threshold value, which shows a signal of insufficient nodes. But, one then has to increase the number of training points and restart the training procedure again.

In this paper we use the strategy as proposed in EY18

, the energy, and correspondingly the loss, is evaluated using Monte Carlo integration and the training points are re-selected in each iteration of the optimization process. The idea is in spirit similar to that of mini-batch in stochastic gradient descent method. Imagine that our training data set contains all the points in the domain

and at each iteration step we only train the model based on one mini-batch of the entire data set. The training process is then terminated if certain stopping criteria is satisfied. We have found that such a procedure, even at first glance seems less accurate in evaluating the energy, is more stable in the sense that the loss function will stay close to the true value.

3.2 Selection of Optimizer

Even though the energy admits a global minimizer as shown in the previous section, there might be local minimizers particularly in the space of the network parameter,

. So it is not appropriate to use traditional descent method such as the gradient descend method for minimization, as it might be trapped at a local minimum. On the other hand, if the training points are re-selected at each or every several iterations, the energy landscape will vary slightly at each time because of re-selection and thus one is more likely to escape from the local minimum. With such a strategy, ideally any descent method can be applied to find the global minimum. In this paper we exploit the Adaptive Moment Estimation (Adam) optimizer 

KB17 .

4 Numerical results

In this section, we use the developed shallow Ritz method to perform several numerical tests for elliptic problems with singular sources, in two, three, and six dimensions.

In the following examples, we choose sigmoid as the activation function. After the training process is finished, we use the cross-validation used in machine learning practice to check the accuracy. That is, we measure the accuracy of the solution using testing error instead of training error. To do this, we randomly choose testing points lying in to compute the relative and error as

respectively, where

Here is the exact solution of Eqs. (1)-(2) while is the solution obtained from the present model. In the following numerical experiments, we will just list the analytic form of the exact solution in and the level set function to represent the interface so that the resultant function and the source density in Eq. (1) can be easily computed via the equation itself and the derivative normal jump condition , respectively. In general, we set the testing number , where is the number of training points in . We terminate the training procedure after iterations with a fixed learning rate throughout all the experiments.

Example 1

In the first example, we choose a square domain with a circular interface that can be labelled by the zero level set of function . The exact solution is chosen as


We choose in this example.

Fixed training points

At first, we would like to demonstrate the over-fitting problem of using training points that are fixed throughout the whole optimization process.

Instead of using Monte Carlo method to approximate the energy, we use midpoint rule with fixed uniform grid points in domain , on interface , as well as on the domain boundary , which is supposed to be more accurate in approximating the integrals. Notice that, the midpoint quadrature rule leads to exactly the same formula for the loss function as in Eq. (13). More precisely, on the training point selection, we have uniform quadrature nodes inside the domain as


where is the grid size. The interface and domain boundary both have grids points that are also distributed uniformly. So overall we have , which gives training points in total. The shallow neural network to be trained consists of one hidden layer with neurons that gives totally parameters to be trained. The penalty constant is set by . We note that the number of training points is much greater than the number of parameters used in this case.

The training loss obtained during the optimization process is shown in Fig. 1(a) as solid (blue) line. It is observed that the loss function indeed decreases throughout the whole training process. Besides, in panels (b), (c) and (d), each term that contributes to the loss function corresponding to domain (), domain boundary () and interface (), respectively, are also shown as solid (blue) lines.

Figure 1: Evaluation of the loss. The solid (blue) line shows the loss obtained during training process while the dashed (red) line shows a re-evaluation of the loss using testing points. The dotted (black) line in (a) indicate energy values correspond to the exact solution.

In particular, in Fig. 1(a) we show a dotted (black) line indicating the energy corresponds to the exact solution that is the theoretical minimum energy. Surprisingly, one can see the training loss goes below the dotted line at about iterations, which in principle should never happen (since the exact solution provides the global minimum of the loss function). As we use a loss function that is, in some sense, a discrete representation of the energy, we expect to have the same global energy minimizer so the loss should not go beyond the same minimum value. It also turns out that the solution obtained at iterations is completely divergent from the exact solution although not shown here.

In addition, we carefully re-evaluate all terms in the loss function (13) using testing points throughout the whole training process. The number of testing points is much more than the training one, here we use , to ensure an accurate evaluation of its continuous counterpart in the energy. The results are shown as dashed (red) lines in Fig. 1 and we denote these values as actual energy during training since it is more truthful to represent the real energy. It can be seen in panel (a) that the actual energy during training is indeed always larger than the theoretical minimum value that is consistent with the analysis. Besides, at about iterations, the actual energy is increasing, unlike the training loss that continues to decrease. We also observe that the contributions of loss from the domain boundary and interface are accurately predicted while the one from the domain (shown in panel (b)) is far from the correct value at the later stage of training. Consequently the training loss is totally different from its actual energy during training (see Fig. 1(a) at later stage of iterations) and predicts values that are much less than the correct ones.

Such a scenario is known as over-fitting: The model learns too much detail of the training data and it deteriorates the performance of the model on new data. We therefore conclude that using fixed training points causes the problem of over-fitting. We should also point out that it happens even when the number of training points is much greater than the number of parameters (here in this experiment vs. ). To overcome such a difficulty, in the following we use the strategy similar to the mini-batch of stochastic gradient descent method, namely, we re-select randomly the training points in each iteration of the optimization process.


With the above re-selection strategy, we then compare the training performance with two different optimizers: Adam and stochastic gradient descent (SGD). The results are shown in Fig. 2 where the dashed (red) and solid (blue) lines indicate the training losses corresponding to Adam and SGD, respectively. The dotted (black) line indicates the theoretical minimum energy attained by the exact solution. It can be seen that the losses of the two optimizers decrease rapidly, while the Adam optimizer converges slightly faster.

We also show in the inset of the figure a zoom-in to the region close to the minimum energy. Due to the stochastic nature of the training points selection, the loss oscillates as expected. But in average Adam converges faster and is closer (more accurate) to the minimum loss energy than SGD. Therefore, in all the following examples, we use Adam optimizer.

Figure 2: Comparison between the performance of optimizers. The dotted (black) line indicate the theoretical minimum energy given by the exact solution. The inset shows a zoom-in to the region close to the minimum energy.
Comparison with immersed boundary method

Here we compare the solution of present shallow Ritz method with the numerical solution obtained by immersed boundary method, which is known to be first-order accurate finite difference method for elliptic problems with singular forces, Eqs. (1-2), on Cartesian grids BL92

due to the employment of the discrete (or regularized) version of the delta function. Note that, in IB method, the total number of degrees of freedom (number of unknowns), denoted by

, is equal to the sum of the number of Cartesian grid points . We also denote being the number of Lagrangian markers on the interface.

Table 1 shows the compared results where the IB method uses the grid resolutions , and so the number , and , respectively, while we use neurons in the hidden layer, which corresponds to the number of parameters used in the present shallow Ritz method are just , and . One can see that how significantly different those numbers of unknowns are. Using just a few number of neurons, and with training points , the results obtained by the present network are comparable with the ones obtained by the IB method.

We should point out that the majority of the results in this paper is implemented in Python with single precision, so the accuracy of the solution is limited by floating-point arithmetics. The best accuracy achieved in

norm is in the order of with neurons used in this example. This also explains why an increase of number of neurons to does not help improving the accuracy.

6400 1.933302 (10, 51) 1.817202 1.188302
25600 9.715103 (20, 101) 9.552103 6.740903
102400 4.835503 (30, 151) 7.502503 6.829203
Table 1: Comparison between immersed boundary method and shallow Ritz method. : Solution obtained by the IB method. : Solution obtained by the present model. : Exact solution. , .

Example 2

In the second example, we aim to demonstrate the effectiveness of the present level set function augmentation described in Subsection 2.3 by solving the problems with or without the level set function input. We choose a setup similar to the previous example, a square domain with a circular interface of radius centered at the origin. We choose and the exact solution is given as


If one does not require an augmented variable, i.e., not taking the third input of level set function, a shallow network can be developed with just neurons in the input layer (recall for the present shallow Ritz method, there are neurons in the input layer for two-dimensional problems). For this case we denote as the number of neurons used in the hidden layer, thus the total number of parameters in the network is . The network can be trained with exactly the same loss function defined in Eq. (13) and we denote the solution without augmentation as .

Table 2 shows a comparison between the accuracy performance of (without level set input) and (with level set function input) where the number of training points are given as . In fact it does not require that many training points in the domain, but we just want to avoid insufficient training points in all the following cases.

It turns out that the third augmented input carrying the information of level set function can effectively improve the training accuracy. For the one with only inputs, the network output fails to capture the exact solution as one can see the relative error is of the order even with neurons in the hidden layer ( parameters). However, for the present network with level set function augmentation, using neurons in the hidden layer is already capable to approximate the solution to the order both in relative and errors. This example demonstrates precisely the effectiveness of introducing the augmented variable that carries additional level set information to the network.

(10, 41) 4.014001 3.949101 (10, 51) 1.723202 8.348303
(30, 121) 3.607901 2.495901 (20, 101) 7.149203 3.750303
(500, 2001) 3.237001 2.167101 (30, 151) 5.573403 3.613703
Table 2: Comparison between the networks with or without the third level set function input. : Solution obtained by a shallow network with neurons in the input layer. : Solution obtained by the present model. : Exact solution. , .

Example 3

In the third example we highlight the mesh-free nature of neural network by considering an irregular domain that is given by a polar curve . The interface is chosen as an ellipse that can be labelled by the zero level set of function . We fix and the exact solution is chosen as


We randomly sample training points in the domain with . The results are shown in Table 3. We obtain solutions that are accurate to the order of in relative error with just neurons in the hidden layer. We should also point out that, by contrast, it can be difficult for traditional finite difference methods to solve the problem on such an irregular domain. Thus, we emphasize that the irregular domain case can be properly handled without any substantial difficulty.

(30, 151) 1.709502 8.814803
(40, 201) 1.227702 8.548903
Table 3: : Solution obtained by the present model. : Exact solution. , .

In Figure 3(a) we show the cross section of the solution at and the solution at in (b). The present network indeed accurately approximates the solution even though there exists cusps at the interface. Furthermore, in Fig. 4 the solution profile and the absolute error are shown. We observe that the maximum error occurs at regions close to the interface but is not significantly larger than other testing points.

Figure 3: Cross sections at (a) and (b) for the solution in Example 3. The model solution is shown as solid (blue) lines and the exact solution is shown as dashed (red) lines.
Figure 4: Left: The solution profile obtained by the present shallow Ritz method. Right: The profile of absolute error.

Example 4

In this example we show the capability of present method for solving the problem in three dimensions. We choose the domain as a cube with a spherical interface that is labelled by the zero level set of function . The exact solution is chosen as


In this test, we choose and . The results are shown in Table 4. Again, the present network shows a good performance. The relative error is less than with just neurons in the hidden layer.

(20, 121) 1.996002 1.434302
(30, 181) 1.627402 9.976903
(40, 241) 1.272702 8.766503
Table 4: : Solution obtained by the present model. : Exact solution. , .

Example 5

As the last example, we demonstrate the capability of the present method for solving high-dimensional problem by taking the dimension size . For the problem setup, we consider a -sphere of radius as the domain enclosing another smaller -sphere of radius as the interface . The interface can be labelled by the zero level set of function . We fix and the exact solution is chosen as


where .

We choose the number of training points based on the following strategy: With given radius , the volume of the -sphere is and the surface area is . So the ratio between the two is . We choose the number of training points in the sphere and on the surface based on this ratio, namely, if we have points in the domain that corresponds to effective radius , we choose points on the boundary and on the interface.

The results are shown in Table 5. With just neurons in the hidden layer, we obtain the relative error is in the order of . In addition, we compare the performance between different number of training points, shown in Table 6. It is somewhat surprising to see that, the solution accuracy is already good enough by choosing in six dimensions, while doubling or even quintupling the training points does not improve the accuracy. Also we show in Fig. 5 the evolution of the training loss and the relative error during the training process. Again, the results between three different choices of training points are almost indistinguishable. This also demonstrates that the required number of training points does not have to increase with the problem dimension. Here even in dimensions, we only need as few as total training points to achieve the desired accuracy.

(10, 91) 2.830902 6.407303
(20, 181) 2.661202 7.011403
(30, 271) 2.029202 6.973503
Table 5: : Solution obtained by the present model. : Exact solution. , .
(10, 91) 2.487702 7.437903
(10, 91) 2.507302 7.297703
(10, 91) 2.830902 6.407303
Table 6: Comparison between the number of training points. : Solution obtained by the present model. : Exact solution.
Figure 5: The evolution plots of (a) training loss, and (b) relative error. : solid (blue) line; : dashed (red) line; : dotted (yellow) line.

5 Conclusion

In this paper, a novel shallow Ritz method is developed to solve elliptic problems with delta function singular sources on the interface. By introducing the energy functional, the governing equation is reformulated as a variational problem. The crucial observation is that the contribution of singular sources in the equation becomes a regular surface integral term in the energy. Therefore one does not require introducing a discrete delta function that is commonly used in the immersed boundary method. To enforce the boundary condition, we simply introduce a penalty term in the energy. It was found that such a treatment modifies slightly the global minimizer to the one satisfies Robin-type boundary condition.

We propose a shallow neural network to approximate the global minimizer of the energy functional and, as a consequence, the network is trained by minimizing the loss function that presents a discrete version of the energy. In addition, we include the level set function as a feature input and find that it significantly improves the training accuracy. We perform a series of numerical tests to demonstrate the accuracy and efficiency of the present network as well as its capability for problems in irregular domains or in high dimensions. As shown in numerical experiments in this paper, most of the testing problems can be solved with acceptable accuracy by the present network with a small number (no larger than ) of neurons.

Although the present network is similar in spirit to deep Ritz method EY18 , here we consider a completely shallow neural network (only one hidden layer) with sufficiently small number of neurons so it reduces the computational complexity and learning workload significantly without sacrificing the accuracy.

In the present work, we only consider the stationary elliptic problems with constant coefficient. There should be no difficulty to consider problems with contrast coefficients or even variable coefficients. The same framework can be applied straightforwardly by writing down the corresponding energy functional. As a forthcoming extension, we shall consider time-dependent problems, and particularly the moving interface problems which will be left as future work.


T.-S. Lin and W.-F. Hu acknowledge supports by Ministry of Science and Technology (MOST), Taiwan, under research grant 109-2115-M-009-006-MY2 and 109-2115-M-008-014-MY2, respectively. T.-S. Lin and W.-F. Hu also acknowledge support by NCTS of Taiwan. M.-C. Lai acknowledge supports by MOST, Taiwan, under research grants 107-2115-M-009-016-MY3. M.-C. Lai, C.-C. Chang and W.-S. Lin acknowledge supports by MOST, Taiwan, under research grants 108-2119-M-009-012-MY2.


  • (1) C. S. Peskin, Numerical analysis of blood flow in the heart, J. Comput. Phys. 25 (1977) 220–252.
  • (2) J. L. Guermond, P. Minev, J. Shen, An overview of projection methods for incompressible flows, Comput. Methods Appl. Mech. Engrg 195 (44) (2006) 6011–6045.
  • (3)

    G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signal Syst. 2 (1989) 303–314.

  • (4) K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (1989) 359–366.
  • (5) Z. Lu, H. Pu, F. Wang, Z. Hu, L. Wang, The expressive power of neural networks: A view from the width, NIPS’17 (2017) 6232–6240.
  • (6) B. Hanin, M. Sellke, Approximating continuous functions by relu nets of minimal widths (2018). arXiv:1710.11278.
  • (7)

    M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys. 378 (2019) 686–707.

  • (8) W. E, B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Commun. Math. Stat. 6 (2018) 1–12.
  • (9) H. Sheng, C. Yang, PFNN: A penalty-free neural network method for solving a class of second-order boundary-value problems on complex geometries, J. Comput. Phys. 428 (2021) 110085.
  • (10) J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, PNAS 115 (2018) 8505–8510.
  • (11) J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys. 375 (2018) 1339–1364.
  • (12) C. S. Peskin, The immersed boundary method, Acta Numerica 11 (2002) 479–517.
  • (13) Z. Li, On convergence of the immersed boundary method for elliptic interface problems, Math. Comput. 84 (2015) 1169–1188.
  • (14) Z. Li, K. Ito, The immersed interface method, SIAM, 2006.
  • (15) R. J. LeVeque, Z. Li, The immersed interface method for elliptic equations with discontinuous coefficients and singular sources, SIAM J. Numer. Anal. 31 (4) (1994) 1019–1044.
  • (16) M.-C. Lai, H.-C. Tseng, A simple implementation of the immersed interface methods for stokes flows with singular forces, Computers & Fluids 37 (2) (2008) 99–106.
  • (17) W.-F. Hu, T.-S. Lin, M.-C. Lai, A discontinuity capturing shallow neural network for elliptic interface problems (2021). arXiv:2106.05587.
  • (18) Z. Wang, Z. Zhang, A mesh-free method for interface problems using the deep learning approach, J. Comput. Phys. 400 (2020) 108963.
  • (19) H. Guo, X. Yang, Deep unfitted nitsche method for elliptic interface problems (2021). arXiv:2107.05325.
  • (20)

    Z. Cai, J. Chen, M. Liu, X. Liu, Deep least-squares methods: An unsupervised learning-based numerical method for solving elliptic PDEs, J. Comput. Phys. 420 (2020) 109707.

  • (21) X.-D. Liu, T. Sideris, Convergence of the ghost fluid method for elliptic equations with interfaces, Math. Comput. 72 (2003) 1731–1746.
  • (22) A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica 8 (1999) 143–195.
  • (23) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2017). arXiv:1412.6980.
  • (24) R. P. Beyer, R. J. LeVeque, Analysis of a one-dimensional model for the immersed boundary method, SIAM J. Numer. Anal. 29 (2) (1992) 332–364.