# BINet: Learning to Solve Partial Differential Equations with Boundary Integral Networks

We propose a method combining boundary integral equations and neural networks (BINet) to solve partial differential equations (PDEs) in both bounded and unbounded domains. Unlike existing solutions that directly operate over original PDEs, BINet learns to solve, as a proxy, associated boundary integral equations using neural networks. The benefits are three-fold. Firstly, only the boundary conditions need to be fitted since the PDE can be automatically satisfied with single or double layer representations according to the potential theory. Secondly, the dimension of the boundary integral equations is typically smaller, and as such, the sample complexity can be reduced significantly. Lastly, in the proposed method, all differential operators of the original PDEs have been removed, hence the numerical efficiency and stability are improved. Adopting neural tangent kernel (NTK) techniques, we provide proof of the convergence of BINets in the limit that the width of the neural network goes to infinity. Extensive numerical experiments show that, without calculating high-order derivatives, BINet is much easier to train and usually gives more accurate solutions, especially in the cases that the boundary conditions are not smooth enough. Further, BINet outperforms strong baselines for both one single PDE and parameterized PDEs in the bounded and unbounded domains.

## Authors

• 2 publications
• 3 publications
• 2 publications
• 103 publications
• 5 publications
• 274 publications
• 20 publications
04/28/2022

### BI-GreenNet: Learning Green's functions by boundary integral network

Green's function plays a significant role in both theoretical analysis a...
04/01/2022

### Computational stability analysis of PDEs with integral terms using the PIE framework

The Partial Integral Equation (PIE) framework was developed to computati...
02/04/2021

### Machine Learning for Auxiliary Sources

We rewrite the numerical ansatz of the Method of Auxiliary Sources (MAS)...
03/03/2022

### Spectrally accurate solutions to inhomogeneous elliptic PDE in smooth geometries using function intension

We present a spectrally accurate embedded boundary method for solving li...
02/28/2005

### Gradient Vector Flow Models for Boundary Extraction in 2D Images

The Gradient Vector Flow (GVF) is a vector diffusion approach based on P...
09/24/2021

### Discovering PDEs from Multiple Experiments

Automated model discovery of partial differential equations (PDEs) usual...
06/26/2020

### An unsupervised deep learning approach in solving partial-integro differential equations

We investigate solving partial integro-differential equations (PIDEs) us...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Partial differential equations (PDEs) have been widely used in scientific fields and engineering applications, such as Maxwell’s equations in optics and electromagnetism [1], Navier–Stokes equations in fluid dynamics [2], the Schrödinger equations in the quantum physics [3], and Black-Scholes equations for call option pricing in finance [4]. Therefore, finding the solution to PDEs has been a critical topic in research over the years. However, in most cases, the analytical solution of PDEs is infeasible to obtain, such that numerical methods become the major bridge between PDE models and practical applications.

In the past decade, deep learning has achieved great success in computer vision, natural language processing, and many other topics

[5]. It is found that deep neural networks (DNNs) have the attractive capability in approximating functions, especially in high dimensional space. Therefore, DNNs hold great potential in solving PDEs with the promise of providing a good ansatz

to represent the solution, where the parameters can be obtained by training DNNs with proper loss functions.

In the literature, many efforts have been devoted to developing DNN-based methods for solving different kinds of PDEs, such as DGM [6], Deep-Ritz [7], and PINN [8]. The main idea of these methods is to use a neural network to approximate the solution of the PDE directly. The loss function is designed by either incorporating the PDE residual and the boundary or initial conditions, or the energy functional derived from the variational form of the PDE.

However, two important issues are not fully considered in most existing works. First, PDEs are merely utilized to construct the loss function, and the essence behind PDEs may be further explored to design a new network structure to cater the need for solving differential equations. Second, when it comes to complex problems, such as PDEs with oscillatory or even singular solutions, failure of the aforementioned methods is frequently reported [9] due to high order differentiation of the neural networks with respect to the inputs. The appearance of high-order derivatives may lead to instability in training [10] (for example, amplified oscillation or singularities) such that the network can not find the exact solution.

To address the two issues above, in this paper, we propose a novel method, named BINet, combining boundary integral equations and deep learning to solve PDEs. Utilizing fundamental solutions of PDE and Green’s formula [11], the solution to PDE can be expressed in the form of a boundary integral, where the explicit fundamental solution of PDE serves as the integral kernel. A new network structure is then designed based on this integral expression of the solution such that the output of our network can satisfy the PDE automatically. Since the PDE has been satisfied, we only need to take the boundary condition as the supervisory signal for the loss. In BINet, the prior information provided by the PDE is fully integrated into the network. Moreover, the differential operator is substituted by an integral operator, which avoids the extra differential operations of the neural networks. The main advantages of BINet are summarized below:

First, BINet adopts an explicit integral representation of the solution such that the output of BINet satisfies the original PDE automatically. This means that the training of BINet is naturally confined in the solution space of PDE. Since BINet is defined in a much smaller space, i.e., the solution function space, the training of BINet is faster and more stable than the general neural network. Another advantage of integral representation is that all differential operators are removed in BINet. Then the regularity requirement of BINet is relaxed significantly which enables BINet to approximate the solutions with poor regularity. Moreover, BINet has good theoretical properties. Using neural tangent kernel (NTK) techniques [12], BINet can be proved to converge as the width of the neural network goes to infinity.

Second, since the PDE has been satisfied automatically with the integral representation in BINet, the residual of the boundary condition is the only component of the loss function. There is no need to balance the residual of the PDE and the boundary condition, BINet thus fits the boundary condition better with less parameter tuning.

Third, BINet can solve PDEs in the unbounded domain since the integral representation holds for both bounded and unbounded domains. For some problems such as electromagnetic wave propagation, solving PDEs in an unbounded domain is critical and complicated using traditional methods. Moreover, existing deep-learning-based models also suffer from the difficulty of sampling in unbounded domains. Therefore, BINet provides a good choice to solve this kind of problem.

Fourth, BINet is also capable to learn a solution operator mapping a parameterized PDE to its solution by feeding the parameters to the network as input. Note that in the integral representation of the solution, the integral kernel, i.e., the fundamental solution to the original PDE, has an explicit form dependent on the differential operator of the PDE. Moreover, the integral is conducted exactly on the boundary of the domain on which the PDE is solved. Therefore, BINet has great advantages in learning the solution operator mapping differential operator or computational domain to the corresponding solution.

At last, the boundary integral is defined on the boundary whose dimension is less by 1 than the original computational domain. Lower dimension leads to fewer sample points which will reduce the computational cost.

The rest of this paper is organized as follows. An overview of related work on solving PDEs using deep learning approaches is given in Section 2. The boundary integral method and BINet are introduced in Section 3. In Section 4, we analyze the convergence of BINet using the NTK techniques. Extensive numerical experiments are shown in Section 5. At last, conclusion remarks are made in Section 6.

## 2 Related Work

Solving PDEs with neural network can be traced back to 1990s [13, 14, 15]. Together with the deep learning revolution, solving PDEs with neural networks also enter a period of prosperity. In a neural network-based PDE solver, the loss function and network structure are two key ingredients.

Regarding the loss function, one natural choice is the residual of PDE. In [8, 6], norm of the residual is used as the loss function. For elliptic equations, the variation form provides another choice of the loss function. Yu and E proposed to use Ritz variational form as the loss function in [7] and Galerkin variational form was formulated as an adversarial problem in [16]. In [17, 18], to avoid high order derivatives in the loss function, high order PDEs are first transformed to first-order PDEs system by introducing auxiliary variables. For the first-order system, we only need to compute first-order derivatives in the loss function. To solve PDEs, boundary condition has to be imposed properly. One simple way to enforce the boundary condition is to add it to the loss function as a penalty term. In this approach, we must tune a weight to balance the PDEs’ residual and boundary conditions. Usually, this weight is crucial and subtle to get good results. The other way is to impose the boundary condition explicitly by introducing a distance function of the boundary [19]. Regarding that network structure, there are also many works recently. A fully connected neural network (FCN) is one of the most frequently used networks. In [7], it is found out that residual neural network (ResNet) gives better results. For PDEs with multiscale structure, a multiscale neural network was designedspecifically by introducing multiscale structure in the network [20]

. Activation function is another important part of neural networks. Choice of the activation function is closely related to the smoothness of the neural network. To compute high-order derivatives, smooth activation functions, sigmoid, tanh, etc., are often used in PDE solvers. ReLU activation which is used most often in machine learning is hardly used due to its poor regularity. For special PDEs, other activation functions are also used, such as sReLU

[20], sine function[21]. By constrast, our BINet adopts an explicit integral representation of the solution, therefore the output satisfies the original PDE automatically.

Another related research is to learn the solution operator, i.e. map from the parameter space to the solution space. Both the parameter space and solution space may be infinitely dimensional. Therefore, learning solution operator is more challenging than solving a single PDE. Solution operators may be complicated also, and network architecture becomes more important. In [22, 23]

, while Green’s function and Fourier transform are used respectively to design good network architecture, the purpose is not for solving PDEs and thus different from ours. The network solving the single PDE can also be generalized to learn solution operators

[24, 25, 26]. In [27, 28, 29], a neural network is used to solve the PDEs with uncertainty.

## 3 Boundary Integral Network (BINet)

Let be a bounded domain, be the closure of and . We consider the PDE in the following form,

 Lu(x)=0. (3.1)

In this paper, is chosen to be Laplace operator or Helmholtz operator . But in general, BINet can be applied as long as the fundamental solution of in can be obtained. We list more options of in Appendix. We consider both interior problems and exterior problems. In interior and exterior problems, the PDE is defined in and respectively.

In this paper, we consider the Dirichlet type of boundary condition Other types of boundary conditions can be easily handled in BINet with a small modification of the boundary integral equation.

### 3.1 Potential Theory

In this subsection, we briefly introduce the basics of the potential theory which provides the theoretical foundation of BINet. We recall an important theorem in potential theory [11].

###### Theorem

For any continuous function defined on , the single layer potential is defined as

 S[h](x):=−∫∂ΩG(x,y)h(y)dsy, (3.2)

and the double layer potential is defined as

 D[h](x):=−∫∂Ω∂G(x,y)∂nyh(y)dsy. (3.3)

with denotes out normal of at , is the fundamental solution of equation (3.1). Then, both single layer potential and double layer potential satisfy (3.1). And for all , we have

 limx→x0S[h](x) =S[h](x0), (3.4) limx→x±0D[h](x) =D[h](x0)∓12h(x),

where and mean converging in and respectively.

For many important PDEs, fundamental solutions can be written explicitly. For the Laplace equation in , the fundamental solution is , while the fundamental solution for the Helmholtz equation in is where is the Hankel function. For the Laplace equation and the Helmholtz equation in the high dimensional case and more equations, please refer to Appendix.

Based on Theorem 3.1, the single/double layer potential (3.2) (3.3) give explicit integral representations for the solution of the PDE. Using these integral representations, we can construct a network such that the output of the network solves the PDE automatically even with random initialization. This is also the main observation in BINet.

### 3.2 The Structure of BINet

In this subsection, we will explain how to use the boundary integral form or to construct the structure of BINet. As shown in Fig. 1, BINet consists of three components: input, approximation, integration.

• From the integral formula of the single/double layer potential, it is clear that BINet has three inputs: point in the computational domain , differential operator , and domain boundary . Differential operator determines the fundamental solution and domain boundary gives the domain of the integral.

• In the single/double layer potential, only a density function is unknown. In BINet, the density function

is approximated using a multilayer perceptron (MLP) (or a residual network, a.k.a, ResNet) denoted as

with the learning parameter . Note that is defined on the boundary only.

• Compute single or double layer potential in Theorem 3.1 by kernel integration of the density function on the boundary where the kernel is given by the explicit fundamental solution . The integration can be done numerically by the methods shown in [30, 31].

To train BINet, the loss function is given by (3.4) in Theorem 3.1.

 L(θ)=⎧⎪ ⎪⎨⎪ ⎪⎩∥S[h( ⋅ ;θ)](x)−g(x)∥2∂Ω, single layer potential∥(12I+D)[h( ⋅ ;θ)](x)−g(x)∥2∂Ω, double layer potential (Interior problem)∥(−12I+D)[h( ⋅ ;θ)](x)−g(x)∥2∂Ω, double layer potential (Exterior problem) (3.5)

where and are the potential operators defined in Theorem 3.1, and is the identity operator.

In BINet, the differential operator and the computational domain boundary are naturally incorporated, which means that BINet has the capability to learn the map from the differential operator and computational domain to solutions.

## 4 Convergence Analysis of BINet

In recent years, many efforts have been devoted to the development of the convergence theory for the over-parameterized neural networks. In [12], a neural tangent kernel (NTK) is proposed to prove the convergence, and this tangent kernel is also implicit in these works [32, 33, 34]. Later, a non-asymptotic proof using NTK is given in [35]. It is shown that a sufficiently wide network that has been fully trained is indeed equivalent to a kernel regression predictor. In this work, we give a non-asymptotic proof of the convergence for our BINet.

In BINet, the density function in the boundary integral form is approximated by a neural network as . And a boundary integral operator is performed on the density function, giving the output of BINet on the boundary as . Here for the single layer potential and for the double layer potential of the interior problem or the exterior problem. For simplicity, we denote as the output of BINet limited on the boundary. And the loss is given by the difference between the output and the boundary values, see Section 3 for detail. Due to the operator , the convergence analysis of this structure is non-trivial.

In the learning process, the evolution of the difference between the output and the boundary value obeys the following ordinary differential equation

 ddt(v(x,θ(t))−~v(x))=−∫∂Ω(v(x′,θ(t))−~v(x′))Nt(x,x′)dx′ (4.1)

where is the output of BINet limited on the boundary and is the boundary value, i.e., the label function. For a detailed derivation of (4.1), see Appindex. Here is the kernel at training-step index , with an admissible operator , see Appendix for detail.

In the following two theorems, we would show that the kernel in (4.1) converges to a constant kernel independent of when the width of the layers goes to infinity. And the proof sketch is listed in the Appendix based on the works in [35].

###### Theorem

(Convergence result of kernel at initialization) Fix and . Suppose the activation nonlinear function is ReLU, the minimum width of the hidden layer , and the operator is bounded with . Then for the normalized data and where and

, with probability at least

we have

 |N(x,x′)−[AΘ(L)A](x,x′)|≤(L+1)A2ϵ.

Here is the constant kernel of BINet given by the neural-network kernel . The front and the back operator means the operations are performed with the respect to the first and the second variable of the neural-network kernel.

###### Theorem

(Convergence result of kernel during training) Fix and . Suppose that , and the operator is bounded with . Then with probability at least over Gaussian random initialization, we have for all ,

 |Nt(x,x′)−N0(x,x′)|≤A2ω,

where is the kernel along time and is the kernel when initialization over random Gaussian denoted in Theorem 4 to distinguish with the training process.

Further, we have the following lemma for the positive definiteness of the new constant kernel.

###### Lemma

is positive definite for double layer potential in BINet. For single layer potential, the positive definiteness depends on the compactness of the boundary .

The proof of Lemma 4 is given in Appendix. And the invertibility of the operator is utilized to complete the proof. [36, 37]

By Lemma 4, equation (4.1), Theorem 4 and 4, the error in BINet thus vanishes for double layer potential after fully training () under the assumption that the the width of the neural network goes to infinity. And for single layer potential, the convergence results depend on the boundary, i.e., is compact. The proof of the convergence results is in the real space, however with the complex form of the kernel with for Helmholtz equations, the results still hold with inner product defined in the complex space.

## 5 Experiments

We use BINet to compute a series of examples including solving a single PDE, where differential operator and domain geometry are fixed, and learning solution operators. PDEs defined on both bounded and unbounded domains will be considered. In order to estimate the accuracy of the numerical solution

, the relative error is used, where is the exact solution. We compare our method with two state-of-the-art methods, the Deep Ritz method and PINN only for interior problems, since as we claimed before, other deep-learning-based PDE solvers are not able to handle exterior problems.

In BINet, the fully connected neural network (MLP) or residual neural network (ResNet) are used to approximate the density function. Since there is no regularity requirement on density function, we can use any activation functions including ReLU. For the Laplace equation, the network only has one output, i.e., the approximation of density , while for the Helmholtz equation, because its solution is complex, the network has two outputs, i.e., the real part and the imaginary part of density . In the experiments, we choose the Adam optimizer to minimize the loss function and all experiments are run on a single GPU of GeForce RTX 2080 Ti.

### 5.1 Experimental Results on Solving One Single PDE

Laplace Equation with Smooth Boundary Condition. First, we consider a Laplacian equation in the bounded domain,

 −Δu(x,y)=0, (x,y)∈Ω, (5.1) u(x,y)=eaxsin(ay), (x,y)∈∂Ω,

where is a fixed constant. We will compare the results of PINN, Deep-Ritz method, and BINet for different . For simplicity, we choose . In this example, we will use a residual neural network introduced in [7]. We follow [7] to choose as the activation function in Deep Ritz method and PINN. In BINet, we use ReLU as the activation function since BINet has less regularity requirement.

When , for these three methods, we all selected 800 equidistant sample points on , and for PINN and Deep-Ritz method, we randomly selected 1600 sample points in

. We all use residual neural networks with 40 neurons per layer and six blocks.

When , for the BINet method, we selected 2000 equidistant sample points on the boundary. For PINN and Deep-Ritz method, we randomly selected 4000 sample points in and randomly selected 800 sample points on . We also use residual neural networks with 100 neurons per layer and six blocks. But if we look at the solutions on , we find that the solutions of PINN and Deep Ritz method are quite different from the exact solution, but BINet method still captures the subtle structure of the exact solution. The results of different methods including PINN, Deep-Ritz method and BINet for are shown in Figure 2.

After training for 20000 epochs, the relative

error of these methods is shown in the table 1. In this example, with the same number of layers and neurons, BINet is always better than the other two methods no matter what the value of is. When increases, unlike other methods, the result of the BINet does not get worse.

Laplace Equation with Non-smooth Boundary Condition. Next, let’s consider a Laplace equation with a nonsmooth boundary condition. We also assume the domain and the boundary value problem is

 −Δu(x,y)=0, (x,y)∈Ω, (5.2) u(x,y)=2−|x|−|y|, (x,y)∈∂Ω.

In problem (5.2), the boundary condition is not smooth. In this example, we also used the ResNet with six blocks and 40 neurons per layer for three methods. We selected equidistant 800 sample points on for three methods, and for PINN and Deep-Ritz method, we randomly selected 1000 sample points in . Figure 3 shows the results of different methods. In this example, we take the result of the finite difference method with high precision mesh as the exact solution.

From Figure 3, we can find that for PINN and Deep Ritz methods, the solutions on the boundary are smooth, which are different from the boundary condition. However, the boundary condition is well approximated by the solution of the BINet method. The reason is that, to satisfy the interior smoothness of the solution, the neural network of the PINN and Deep Ritz methods have to be a smooth function. So the solutions are still smooth even if they are close enough to the unsmooth boundary points.

Helmholtz Equation with Different Wavenumbers. In this experiment, we consider an interior Helmholtz equation

 −Δu(x,y)−k2u(x,y)=0, (x,y)∈Ω, (5.3) u(x,y)=ei(k1x+k2y), (x,y)∈∂Ω,

where , and . The Deep-Ritz method can not solve the Helmholtz equation. Hence, we will compare the BINet method and PINN method for different . We choose a fully connected neural network with 4 hidden layers with Sigmoid activation function and 40 neurons per layer. and we choose 800 points on the boundary for BINet and PINN. In addition, we also randomly selected 2400 sample points in . For k = 1 and 4, we use the PINN type method and BINet method to solve the equation respectively. The loss function and results are shown in Figure 4. We can see the loss function of BINet descends faster, and for , the loss of the PINN method does not converge. In contrast, the loss of the BINet is always convergent no matter the value of is. The second and the third figures also show the result of BINet is much better than PINN.

### 5.2 Experimental Results on Solution Operators

The Operator from Equation Parameters to Solutions. In this example, we consider the Helmholtz equations with variable wavenumber .

 −Δu(x,y)−k2u(x,y)=0, (x,y)∈Ω, (5.4) u(x,y)=H10(k√x2+y2), (x,y)∈∂Ω,

In the training phase, we set . We also use double layer potential to construct the loss function, and after 5000 training epochs, we show the relative error versus the wavenumber in Figure 5. From the first figure, the relative error is about or . Compared with solving a single equation, the relative error is still small. The relative error increases slightly with the increase of , which is because the Helmholtz equation becomes more difficult to solve when increases. This means that we have successfully learned the operator mapping of exterior parametric PDE problems on an unbounded domain. Most importantly, although is not selected between during training, the relative error is still small on the test when we take values in the interval . This shows that our method has good generalization ability.

The Operator from Boundary Geometry to Solutions. In this example, we consider a Laplace equation with parametric boundaries. The problem is

 −Δu(x,y) =0, (x,y)∈Ωβ, (5.5) u(x,y) =g(x,y;β), (x,y)∈∂Ωβ,

where the boundary condition , and is the barycenter of the . We assume that can take any triangle in a domain. For simplicity, We can fix one vertex at the origin and one edge on the positive half x-axis, while the third vertex is in the first quadrant by translation and rotation. Then we can assume the vertex is . In this example, we assume can take any value in interval . In this example, we choose a ResNet with eight blocks, and 100 neurons per layer. Single potential layer is used to calculate the boundary integral. We randomly selected 80 triangles to calculate the loss function, and after every 500 epochs, triangles will be randomly selected again. After training for 5000 epochs, we randomly choose two triangles, and the solutions of the each triangle by BINet method has shown in figure 6. The relative error is about . From this, we can see BINet has successfully learned the operator from boundary geometry to solution.

## 6 Conclusion

We have developed a new neural network method called BINet to solve PDEs. In BINet, the solution of PDE is represented by boundary integral composed of an explicit kernel and an unknown density which is approximated by a neural network. Then the PDE is solved by learning the boundary integral representation to fit the boundary condition. Since the loss function measures only the misfit between the integral representation and the boundary condition, BINet has less hyper-parameters and lower sampling dimensions than many other neural network-based PDE solvers. Because the boundary integral satisfies PDE automatically in the interior and exterior of the boundary, BINet can solve bounded and unbounded PDEs. Furthermore, BINet can learn operators from PDE parameters including coefficients and boundary geometry to solutions. Besides, using the NTK technique, we prove that BINet converges as the width of the network goes to infinity. We test BINet with the Laplace equation and Helmholtz equation in extensive settings. The numerical experiments show that BINet works effectively for many cases such as interior problems, exterior problems, high wavenumber problems. The experiments also illustrate the capability of BINet in learning solution operators. All the experiments verify the advantages of BINet numerically. Although our method exhibits competitive performance against the PINN method and DeepRitz method in many situations, the requirement of high-precision boundary integration limits further applications in higher-dimensional problems. This will be the direction of improving BINet in the future.

## Appendix A A review of the PINN and Deep Ritz method

### a.1 PINN method

To solve the linear PDE

 Lu(x)=0, x∈Ω, (A.1) u(x)=g(x), x∈∂Ω,

the main idea of the PINN[8] method is to use a neural network as an ansatz to approximate the solution , where represents the trainable parameters in the neural network. There was other work of the similar idea such as [19, 15, 6]. Then we can use the automatic differentiation tool to calculate the derivative and define the loss function

 L1(θ)=∥Lu(x;θ)∥2Ω.

For the boundary conditions, we can define the loss function

 L2(θ)=∥u(x;θ)−g(x)∥2∂Ω.

Finally, we can combine the loss function and with a hyper-parameter to get loss function,

 L(θ)=L1+βL2=∥Lu(x;θ)∥2Ω+β∥u(x;θ)−g(x)∥2∂Ω.

By minimizing the loss function , PINN will get the approximation solution of the PDE (A.1).

### a.2 Deep Ritz method

For the specific PDE problems in equation (A.1), we can change the equation into a Ritz variational form. This is the main idea of the Deep Ritz method[7]. For instance, if we consider a Laplace equation,

 −Δu(x)=0, x∈Ω, (A.2) u(x)=g(x), x∈∂Ω,

we can solve the equation equivalently by minimizing the following Ritz variational problem

 ∫Ω12|∇u(x)|2dx. (A.3)

We also use a neural network to approximate the solution of the PDE, and we can use the automatic differentiation tool to calculate the gradient of the neural network. So the variation (A.3) can naturally be used as a loss function, defined as

 L1(θ)=∫Ω12|∇u(x;θ)|2dx.

For the boundary condition, the loss function also can be defined as

 L2(θ)=∥u(x;θ)−g(x)∥2∂Ω.

Finally, the loss funtion can be defined as

 L(θ)=L1+βL2=∫Ω12|∇u(x;θ)|2dx+β∥u(x;θ)−g(x)∥2∂Ω, (A.4)

where is also the hyper-parameter. By minimize the loss function , Deep Ritz method will get the solution of the PDE (A.2).

This paper introduce a residual network as the anastz to approximate the solution. The residual network is also used in our work to approximate the density function in the boundary integral form. The architecture of the residual network is shown in Figure 7.

## Appendix B The fundamental solution of different equations

In this section, we make some supplementary introductions to the fundamental solution. Before defining the fundamental solution, we first introduce the -function,

###### Definition

A function called n-dimensional -function if , and for all functions that is continuous at we have

Then, for the PDE

 Lu(x)=0, x∈Ω, (B.1)

we can define the corresponding fundamental solution of the equations (B.1).

###### Definition

A function is called the fundamental solution corresponding to equations (B.1) if is symmetric about and and satisfy

 LyG(x,y)=δ(|y−x|),

where and is the differential operator which acts on component .

Limited by the length of the article, although we only introduce the fundamental solutions of Laplace equations and Helmholtz equations in two-dimensional cases in detail, in general, BINet can be applied as long as the fundamental solution of in can be obtained. Let’s give a few more examples. More details can be found in [38, 11, 39].

### b.1 The Laplace Equations

If we consider a Laplace equation

 −Δu(x)=0, x∈Ω, (B.2)

the fundamental solution has the following form

 G(x,y)=⎧⎪⎨⎪⎩−12πln|x−y|n=21(n−2)wn1|x−y|n−2n≥3,

where is the dimension of the equation and is the volume of the n-dimensional unit sphere. Then the fundamental solution satisfies

 −ΔyG(x,y)=δ(|x−y|).

### b.2 The Helmholtz Equations

The Helmholtz equation has the following form

 −Δu(x)−k2u(x)=0, x∈Ω, (B.3)

where is a real number. The fundamental solution of the Helmholtz equation has the following form

 G(x,y)=⎧⎪⎨⎪⎩i4H10(k|x−y|)n=21(n−2)wneik|x−y||x−y|n−2n≥3,

where is the dimension of the equation and is also the volume of the n-dimensional unit sphere. Then the fundamental solution satisfies

 −ΔyG(x,y)−k2G(x,y)=δ(|x−y|).

### b.3 The Navier’s Equations

We consider Navier’s equations (also called Lam system). These are famous equations in linear elasticity for isotropic materials, and the governing equations are

 −μΔu(x)−(λ+μ)∇(∇⋅u(x))=0, (B.4)

where are the Lam constants of the elastic material, and

is the displacement vector. The fundamental solution of the equation (

B.4) is

 G(x,y)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩λ+3μ8πμ(λ+2μ)[1|x−y|I3+λ+μλ+3μ1|x−y|2(x−y)(x−y)T],n=3,λ+3μ4πμ(λ+2μ)[ln1|x−y|I2+λ+μλ+3μ1|x−y|2(x−y)(x−y)T],n=2. (B.5)

It means that G(x,y) defined by the (B.5) satisfies the following equation,

 −μΔyG(x,y)−(λ+μ)∇y(∇y⋅G(x,y))=δ(|x−y|)In, (B.6)

where

is the n-order identity matrix.

### b.4 The Stokes Equations

Stokes equations are well known in the incompressible viscous fluid model. The general form of the Stokes equations is

 −μΔu(x)+∇p(x) =f(x), (B.7) ∇⋅u(x) =0, x∈Ω⊂Rn,

where and are the velocity and pressure of the fluid flow, respectively, and and are the given dynamic viscosity of the fluid and forcing term, respectively.

For n=2 the fundamental solutions of the (B.10) are

 vk(x,y) =14πμ{log1|x−y|ek+2∑j=1(xk−yk)(xj−yj)ej|x−y|2} (B.8) qk(x,y) =∂∂xk{−12πlog1|x−y|},

and for n=3 the fundamental solutions of the (B.10) are

 vk(x,y) =18πμ{1|x−y|ek+3∑j=1(xk−yk)(xj−yj)ej|x−y|3} (B.9) qk(x,y) =∂∂xk{−14π1|x−y|},

where and denotes the unit vector along the -axis. and satisfy

 −μΔxvk(x,y)+∇xqk(x,y) =δ(|x−y|)ek, (B.10) ∇x⋅vk(x,y) =0,

where .

### b.5 The Biharmonic Equation

The Biharmonic Equation is a single scalar 4th-order equation, which can be reduced from plane elasticity and plane Stokes flow. We consider a two dimensional Biharmonic equation,

 Δ2u(x)=0, x∈Ω⊂R2. (B.11)

The fundamental solution of the equation (B.11) is

 G(x,y)=18π|x−y|2log|x−y|, x,y∈R2, (B.12)

where satisfies

 Δ2yG(x,y)=δ(|x−y|).

## Appendix C The convergence analysis of BINet

### c.1 The structure for solving PDEs using neural networks

BINet consists of a neural network such as MLP and an integral operator performed on the output of the neural network. Thus, the output of BINet reads

 v(x,θ)=A[h](x,θ),

where is the neural network approximating the density function in the boundary integral form, is the -dimensional variable. The operator is performed on the output of the neural network which completes the whole architecture. And the loss function is

 L=∥v(x,θ)−~v(x)∥22,

with label function .

For a more general setup, the operator has different forms. For PINN/DGM method, the operator is directly the partial differential operator, implying

 A[u](x,θ)=−Δu(x,θ),

where is the approximation of the solution. The Deep-Ritz method for solving the Laplace equation is to minimize the optimization problem where part of the loss reads

 L1(θ) =∫Ω12|∇u(x,θ)|2dx.

It follows that the corresponding operator has the following form

 A[u](x)=1√2∇u(x,θ).

Therefore in the view of the operator applied on the neural network, different from the integral type operator of BINet, PINN and Deep Ritz methods have extra differential operators although the Deep Ritz method decreases the order from the second to the first.

###### Definition

The operator is admissible if the following conditions hold:

1. (linear property);

2. (commutative property);

3. (parameter variant).

It is easy to check that the operators of PINN, Deep-Ritz, and our BINet all satisfy the admissible property. And the admissibility is crucial in the following proof.

The different design of the neural network and the operator makes the network different. Here, we adopt the typical settings of the neural network as an MLP. As the integral operator is bounded, thus the convergence results can be obtained in our BINet and the proof is shown in Appendix 3.3. Here the structure of the neural network is introduced first for the derivation of the NTK form.

The -hidden layer MLP is defined as

 input layer: g(0)=y, (C.1) hidden layer: g(l)(y)Δ=√cσdlσ(f(l)(y)),f(l)(y)=W(l)g(l−1)(y),l∈[L] (C.2) output layer: h(y,θ)=f(L+1)=W(L+1)g(L)(y), (C.3)

where is the hidden layer, is the trainable parameters which is the standard representation for the weights , is the width of the -th layer and .

### c.2 The dynamic neural tangent kernel

We have chosen the MLP as the neural network in the analysis for simplicity. A similar analysis can also be done for other structures as the convergence results of such neural networks are reported in the literature [40, 41]. Applying the integral operator on the neural networks should also give similar convergence results. Thus different schemes here imply different forms of the operator , see Appendix C.1 for detail.

The training process of the neural ODE is basically to minimize the loss by the method based on the gradient. One typical scheme is the gradient descent method which has the form

 θn+1=θn−α∂L∂θ. (C.4)

When the learning rate , we have the limiting gradient flow

 dθdt=−∂L∂θ, (C.5)

where is the continuous version of index of the learning steps in the training process. More precisely, for the weight matrix , we have the evolution

Hence the evolution of the prediction satisfies the following form

 dv(x,θ)dt=∑θp∂v∂θp∂θp∂t=∑θp−A[∂h∂θp](x)∂L∂θp=∑θp−A[∂h∂θp](x)⟨∂L∂v(x′),∂v∂θp(x′)⟩=∑θp−A[∂h∂θp](x)⟨ζ(x′),A[∂h∂θp](x′)⟩=−⟨ζ(x′),∑θpA[∂h∂θp](x)A[∂h∂θp](x′)⟩, (C.6)

where is the vector of loss, and denotes the index of the learning parameter. We denote the dynamic Neural Tangent Kernel (DNTK) for the PDE-based neural network as

 N(x,x′)=∑θpA[∂f∂θp](x)A[∂f∂θp](x′)=∑l⟨A[∂f∂W(l)](x),A[∂f∂W(l)](x′)⟩W, (C.7)

where is defined as the summation over each component index of .

Next, we would give the explicit form of the DNTK for further analysis. Recall that the output of the MLP in PDE-based neural network has the following form

 h(x,θ)=f(L+1)(x)=W(L+1)g(L)(x)=W(L+1)√cσdLσ(⋯W(l+1)√cσdlσ(W(l)√cσdl−1σ(⋯))⋯), (C.8)

where we have omitted the explicit dependence of in the formula for simplicity. And thus

 h(x,θ)=W(L+1)√cσdLσ(⋯f(l+1)⋯), (C.9)

where

 (C.10)

To give the form of the DNTK, the key is to give the form of . From above forms, we can obtain

 ∂h∂W(l)=∂h∂f(l)∂f(l)∂W(l)=∂h∂f(l+1)∂f(l+1)∂g(l)∂g(l)∂f(l)∂f(l)∂W(l)=(∂h∂f(l+1)W(l+1)√cσdlS(l))T(g(l−1))T=√cσdlS(l)(W(l+1))T(∂h∂f(l+1))T(g(l−1))T, (C.11)

where we have used the denotation

 [S(l)]ij=[˙σ(f(l))]iδij. (C.12)

By defining and we obtain

 ∂h∂W(l)=b(l)(g(l−1))T,l∈[L] (C.13)

where satisfies the induction relation .

With the admissible property of in the sense of Definition C.1, we have the DNTK as

 (C.14)

where is the dynamic neural tangent kernel of the MLP with the following form [42]

 (C.15)

and is a function given by the kernel operated by on its head and the tail for performing with respect to the former variable and latter variable .

Denote the constant neural tangent kernel in [35] as

 Θ(L)(y,y′)=L+1∑l=1(Σ(l−1)(y,y′)L+1∏l′=l˙Σ(l′)(y,y′)), (C.16)

where is given by a reduction form

 Σ(0)(y,y′) =yTy′, (C.17) Λ(l)(y,y′) =(Σ(l−1)(y,y)Σ(l−1)(y,y′)Σ(l−1)(y′,y)Σ(l−1)(y′,y′)), (C.18) Σ(l)(y,