1 Introduction
Partial differential equations (PDEs) have been widely used in scientific fields and engineering applications, such as Maxwell’s equations in optics and electromagnetism [1], Navier–Stokes equations in fluid dynamics [2], the Schrödinger equations in the quantum physics [3], and BlackScholes equations for call option pricing in finance [4]. Therefore, finding the solution to PDEs has been a critical topic in research over the years. However, in most cases, the analytical solution of PDEs is infeasible to obtain, such that numerical methods become the major bridge between PDE models and practical applications.
In the past decade, deep learning has achieved great success in computer vision, natural language processing, and many other topics
[5]. It is found that deep neural networks (DNNs) have the attractive capability in approximating functions, especially in high dimensional space. Therefore, DNNs hold great potential in solving PDEs with the promise of providing a good ansatzto represent the solution, where the parameters can be obtained by training DNNs with proper loss functions.
In the literature, many efforts have been devoted to developing DNNbased methods for solving different kinds of PDEs, such as DGM [6], DeepRitz [7], and PINN [8]. The main idea of these methods is to use a neural network to approximate the solution of the PDE directly. The loss function is designed by either incorporating the PDE residual and the boundary or initial conditions, or the energy functional derived from the variational form of the PDE.
However, two important issues are not fully considered in most existing works. First, PDEs are merely utilized to construct the loss function, and the essence behind PDEs may be further explored to design a new network structure to cater the need for solving differential equations. Second, when it comes to complex problems, such as PDEs with oscillatory or even singular solutions, failure of the aforementioned methods is frequently reported [9] due to high order differentiation of the neural networks with respect to the inputs. The appearance of highorder derivatives may lead to instability in training [10] (for example, amplified oscillation or singularities) such that the network can not find the exact solution.
To address the two issues above, in this paper, we propose a novel method, named BINet, combining boundary integral equations and deep learning to solve PDEs. Utilizing fundamental solutions of PDE and Green’s formula [11], the solution to PDE can be expressed in the form of a boundary integral, where the explicit fundamental solution of PDE serves as the integral kernel. A new network structure is then designed based on this integral expression of the solution such that the output of our network can satisfy the PDE automatically. Since the PDE has been satisfied, we only need to take the boundary condition as the supervisory signal for the loss. In BINet, the prior information provided by the PDE is fully integrated into the network. Moreover, the differential operator is substituted by an integral operator, which avoids the extra differential operations of the neural networks. The main advantages of BINet are summarized below:
First, BINet adopts an explicit integral representation of the solution such that the output of BINet satisfies the original PDE automatically. This means that the training of BINet is naturally confined in the solution space of PDE. Since BINet is defined in a much smaller space, i.e., the solution function space, the training of BINet is faster and more stable than the general neural network. Another advantage of integral representation is that all differential operators are removed in BINet. Then the regularity requirement of BINet is relaxed significantly which enables BINet to approximate the solutions with poor regularity. Moreover, BINet has good theoretical properties. Using neural tangent kernel (NTK) techniques [12], BINet can be proved to converge as the width of the neural network goes to infinity.
Second, since the PDE has been satisfied automatically with the integral representation in BINet, the residual of the boundary condition is the only component of the loss function. There is no need to balance the residual of the PDE and the boundary condition, BINet thus fits the boundary condition better with less parameter tuning.
Third, BINet can solve PDEs in the unbounded domain since the integral representation holds for both bounded and unbounded domains. For some problems such as electromagnetic wave propagation, solving PDEs in an unbounded domain is critical and complicated using traditional methods. Moreover, existing deeplearningbased models also suffer from the difficulty of sampling in unbounded domains. Therefore, BINet provides a good choice to solve this kind of problem.
Fourth, BINet is also capable to learn a solution operator mapping a parameterized PDE to its solution by feeding the parameters to the network as input. Note that in the integral representation of the solution, the integral kernel, i.e., the fundamental solution to the original PDE, has an explicit form dependent on the differential operator of the PDE. Moreover, the integral is conducted exactly on the boundary of the domain on which the PDE is solved. Therefore, BINet has great advantages in learning the solution operator mapping differential operator or computational domain to the corresponding solution.
At last, the boundary integral is defined on the boundary whose dimension is less by 1 than the original computational domain. Lower dimension leads to fewer sample points which will reduce the computational cost.
The rest of this paper is organized as follows. An overview of related work on solving PDEs using deep learning approaches is given in Section 2. The boundary integral method and BINet are introduced in Section 3. In Section 4, we analyze the convergence of BINet using the NTK techniques. Extensive numerical experiments are shown in Section 5. At last, conclusion remarks are made in Section 6.
2 Related Work
Solving PDEs with neural network can be traced back to 1990s [13, 14, 15]. Together with the deep learning revolution, solving PDEs with neural networks also enter a period of prosperity. In a neural networkbased PDE solver, the loss function and network structure are two key ingredients.
Regarding the loss function, one natural choice is the residual of PDE. In [8, 6], norm of the residual is used as the loss function. For elliptic equations, the variation form provides another choice of the loss function. Yu and E proposed to use Ritz variational form as the loss function in [7] and Galerkin variational form was formulated as an adversarial problem in [16]. In [17, 18], to avoid high order derivatives in the loss function, high order PDEs are first transformed to firstorder PDEs system by introducing auxiliary variables. For the firstorder system, we only need to compute firstorder derivatives in the loss function. To solve PDEs, boundary condition has to be imposed properly. One simple way to enforce the boundary condition is to add it to the loss function as a penalty term. In this approach, we must tune a weight to balance the PDEs’ residual and boundary conditions. Usually, this weight is crucial and subtle to get good results. The other way is to impose the boundary condition explicitly by introducing a distance function of the boundary [19]. Regarding that network structure, there are also many works recently. A fully connected neural network (FCN) is one of the most frequently used networks. In [7], it is found out that residual neural network (ResNet) gives better results. For PDEs with multiscale structure, a multiscale neural network was designedspecifically by introducing multiscale structure in the network [20]
. Activation function is another important part of neural networks. Choice of the activation function is closely related to the smoothness of the neural network. To compute highorder derivatives, smooth activation functions, sigmoid, tanh, etc., are often used in PDE solvers. ReLU activation which is used most often in machine learning is hardly used due to its poor regularity. For special PDEs, other activation functions are also used, such as sReLU
[20], sine function[21]. By constrast, our BINet adopts an explicit integral representation of the solution, therefore the output satisfies the original PDE automatically.Another related research is to learn the solution operator, i.e. map from the parameter space to the solution space. Both the parameter space and solution space may be infinitely dimensional. Therefore, learning solution operator is more challenging than solving a single PDE. Solution operators may be complicated also, and network architecture becomes more important. In [22, 23]
, while Green’s function and Fourier transform are used respectively to design good network architecture, the purpose is not for solving PDEs and thus different from ours. The network solving the single PDE can also be generalized to learn solution operators
[24, 25, 26]. In [27, 28, 29], a neural network is used to solve the PDEs with uncertainty.3 Boundary Integral Network (BINet)
Let be a bounded domain, be the closure of and . We consider the PDE in the following form,
(3.1) 
In this paper, is chosen to be Laplace operator or Helmholtz operator . But in general, BINet can be applied as long as the fundamental solution of in can be obtained. We list more options of in Appendix. We consider both interior problems and exterior problems. In interior and exterior problems, the PDE is defined in and respectively.
In this paper, we consider the Dirichlet type of boundary condition Other types of boundary conditions can be easily handled in BINet with a small modification of the boundary integral equation.
3.1 Potential Theory
In this subsection, we briefly introduce the basics of the potential theory which provides the theoretical foundation of BINet. We recall an important theorem in potential theory [11].
Theorem
For any continuous function defined on , the single layer potential is defined as
(3.2) 
and the double layer potential is defined as
(3.3) 
with denotes out normal of at , is the fundamental solution of equation (3.1). Then, both single layer potential and double layer potential satisfy (3.1). And for all , we have
(3.4)  
where and mean converging in and respectively.
For many important PDEs, fundamental solutions can be written explicitly. For the Laplace equation in , the fundamental solution is , while the fundamental solution for the Helmholtz equation in is where is the Hankel function. For the Laplace equation and the Helmholtz equation in the high dimensional case and more equations, please refer to Appendix.
Based on Theorem 3.1, the single/double layer potential (3.2) (3.3) give explicit integral representations for the solution of the PDE. Using these integral representations, we can construct a network such that the output of the network solves the PDE automatically even with random initialization. This is also the main observation in BINet.
3.2 The Structure of BINet
In this subsection, we will explain how to use the boundary integral form or to construct the structure of BINet. As shown in Fig. 1, BINet consists of three components: input, approximation, integration.

From the integral formula of the single/double layer potential, it is clear that BINet has three inputs: point in the computational domain , differential operator , and domain boundary . Differential operator determines the fundamental solution and domain boundary gives the domain of the integral.

In the single/double layer potential, only a density function is unknown. In BINet, the density function
is approximated using a multilayer perceptron (MLP) (or a residual network, a.k.a, ResNet) denoted as
with the learning parameter . Note that is defined on the boundary only.
To train BINet, the loss function is given by (3.4) in Theorem 3.1.
(3.5) 
where and are the potential operators defined in Theorem 3.1, and is the identity operator.
In BINet, the differential operator and the computational domain boundary are naturally incorporated, which means that BINet has the capability to learn the map from the differential operator and computational domain to solutions.
4 Convergence Analysis of BINet
In recent years, many efforts have been devoted to the development of the convergence theory for the overparameterized neural networks. In [12], a neural tangent kernel (NTK) is proposed to prove the convergence, and this tangent kernel is also implicit in these works [32, 33, 34]. Later, a nonasymptotic proof using NTK is given in [35]. It is shown that a sufficiently wide network that has been fully trained is indeed equivalent to a kernel regression predictor. In this work, we give a nonasymptotic proof of the convergence for our BINet.
In BINet, the density function in the boundary integral form is approximated by a neural network as . And a boundary integral operator is performed on the density function, giving the output of BINet on the boundary as . Here for the single layer potential and for the double layer potential of the interior problem or the exterior problem. For simplicity, we denote as the output of BINet limited on the boundary. And the loss is given by the difference between the output and the boundary values, see Section 3 for detail. Due to the operator , the convergence analysis of this structure is nontrivial.
In the learning process, the evolution of the difference between the output and the boundary value obeys the following ordinary differential equation
(4.1) 
where is the output of BINet limited on the boundary and is the boundary value, i.e., the label function. For a detailed derivation of (4.1), see Appindex. Here is the kernel at trainingstep index , with an admissible operator , see Appendix for detail.
In the following two theorems, we would show that the kernel in (4.1) converges to a constant kernel independent of when the width of the layers goes to infinity. And the proof sketch is listed in the Appendix based on the works in [35].
Theorem
(Convergence result of kernel at initialization) Fix and . Suppose the activation nonlinear function is ReLU, the minimum width of the hidden layer , and the operator is bounded with . Then for the normalized data and where and
, with probability at least
we haveHere is the constant kernel of BINet given by the neuralnetwork kernel . The front and the back operator means the operations are performed with the respect to the first and the second variable of the neuralnetwork kernel.
Theorem
(Convergence result of kernel during training) Fix and . Suppose that , and the operator is bounded with . Then with probability at least over Gaussian random initialization, we have for all ,
where is the kernel along time and is the kernel when initialization over random Gaussian denoted in Theorem 4 to distinguish with the training process.
Further, we have the following lemma for the positive definiteness of the new constant kernel.
Lemma
is positive definite for double layer potential in BINet. For single layer potential, the positive definiteness depends on the compactness of the boundary .
The proof of Lemma 4 is given in Appendix. And the invertibility of the operator is utilized to complete the proof. [36, 37]
By Lemma 4, equation (4.1), Theorem 4 and 4, the error in BINet thus vanishes for double layer potential after fully training () under the assumption that the the width of the neural network goes to infinity. And for single layer potential, the convergence results depend on the boundary, i.e., is compact. The proof of the convergence results is in the real space, however with the complex form of the kernel with for Helmholtz equations, the results still hold with inner product defined in the complex space.
5 Experiments
We use BINet to compute a series of examples including solving a single PDE, where differential operator and domain geometry are fixed, and learning solution operators. PDEs defined on both bounded and unbounded domains will be considered. In order to estimate the accuracy of the numerical solution
, the relative error is used, where is the exact solution. We compare our method with two stateoftheart methods, the Deep Ritz method and PINN only for interior problems, since as we claimed before, other deeplearningbased PDE solvers are not able to handle exterior problems.In BINet, the fully connected neural network (MLP) or residual neural network (ResNet) are used to approximate the density function. Since there is no regularity requirement on density function, we can use any activation functions including ReLU. For the Laplace equation, the network only has one output, i.e., the approximation of density , while for the Helmholtz equation, because its solution is complex, the network has two outputs, i.e., the real part and the imaginary part of density . In the experiments, we choose the Adam optimizer to minimize the loss function and all experiments are run on a single GPU of GeForce RTX 2080 Ti.
5.1 Experimental Results on Solving One Single PDE
Laplace Equation with Smooth Boundary Condition. First, we consider a Laplacian equation in the bounded domain,
(5.1)  
where is a fixed constant. We will compare the results of PINN, DeepRitz method, and BINet for different . For simplicity, we choose . In this example, we will use a residual neural network introduced in [7]. We follow [7] to choose as the activation function in Deep Ritz method and PINN. In BINet, we use ReLU as the activation function since BINet has less regularity requirement.
When , for these three methods, we all selected 800 equidistant sample points on , and for PINN and DeepRitz method, we randomly selected 1600 sample points in
. We all use residual neural networks with 40 neurons per layer and six blocks.
When , for the BINet method, we selected 2000 equidistant sample points on the boundary. For PINN and DeepRitz method, we randomly selected 4000 sample points in and randomly selected 800 sample points on . We also use residual neural networks with 100 neurons per layer and six blocks. But if we look at the solutions on , we find that the solutions of PINN and Deep Ritz method are quite different from the exact solution, but BINet method still captures the subtle structure of the exact solution. The results of different methods including PINN, DeepRitz method and BINet for are shown in Figure 2.
PINN  Deep Ritz  BINet  
0.0140  0.0952  0.0031  
0.0262  0.2194  0.0002  
After training for 20000 epochs, the relative
error of these methods is shown in the table 1. In this example, with the same number of layers and neurons, BINet is always better than the other two methods no matter what the value of is. When increases, unlike other methods, the result of the BINet does not get worse.Laplace Equation with Nonsmooth Boundary Condition. Next, let’s consider a Laplace equation with a nonsmooth boundary condition. We also assume the domain and the boundary value problem is
(5.2)  
In problem (5.2), the boundary condition is not smooth. In this example, we also used the ResNet with six blocks and 40 neurons per layer for three methods. We selected equidistant 800 sample points on for three methods, and for PINN and DeepRitz method, we randomly selected 1000 sample points in . Figure 3 shows the results of different methods. In this example, we take the result of the finite difference method with high precision mesh as the exact solution.
From Figure 3, we can find that for PINN and Deep Ritz methods, the solutions on the boundary are smooth, which are different from the boundary condition. However, the boundary condition is well approximated by the solution of the BINet method. The reason is that, to satisfy the interior smoothness of the solution, the neural network of the PINN and Deep Ritz methods have to be a smooth function. So the solutions are still smooth even if they are close enough to the unsmooth boundary points.
Helmholtz Equation with Different Wavenumbers. In this experiment, we consider an interior Helmholtz equation
(5.3)  
where , and . The DeepRitz method can not solve the Helmholtz equation. Hence, we will compare the BINet method and PINN method for different . We choose a fully connected neural network with 4 hidden layers with Sigmoid activation function and 40 neurons per layer. and we choose 800 points on the boundary for BINet and PINN. In addition, we also randomly selected 2400 sample points in . For k = 1 and 4, we use the PINN type method and BINet method to solve the equation respectively. The loss function and results are shown in Figure 4. We can see the loss function of BINet descends faster, and for , the loss of the PINN method does not converge. In contrast, the loss of the BINet is always convergent no matter the value of is. The second and the third figures also show the result of BINet is much better than PINN.
5.2 Experimental Results on Solution Operators
The Operator from Equation Parameters to Solutions. In this example, we consider the Helmholtz equations with variable wavenumber .
(5.4)  
In the training phase, we set . We also use double layer potential to construct the loss function, and after 5000 training epochs, we show the relative error versus the wavenumber in Figure 5. From the first figure, the relative error is about or . Compared with solving a single equation, the relative error is still small. The relative error increases slightly with the increase of , which is because the Helmholtz equation becomes more difficult to solve when increases. This means that we have successfully learned the operator mapping of exterior parametric PDE problems on an unbounded domain. Most importantly, although is not selected between during training, the relative error is still small on the test when we take values in the interval . This shows that our method has good generalization ability.
The Operator from Boundary Geometry to Solutions. In this example, we consider a Laplace equation with parametric boundaries. The problem is
(5.5)  
where the boundary condition , and is the barycenter of the . We assume that can take any triangle in a domain. For simplicity, We can fix one vertex at the origin and one edge on the positive half xaxis, while the third vertex is in the first quadrant by translation and rotation. Then we can assume the vertex is . In this example, we assume can take any value in interval . In this example, we choose a ResNet with eight blocks, and 100 neurons per layer. Single potential layer is used to calculate the boundary integral. We randomly selected 80 triangles to calculate the loss function, and after every 500 epochs, triangles will be randomly selected again. After training for 5000 epochs, we randomly choose two triangles, and the solutions of the each triangle by BINet method has shown in figure 6. The relative error is about . From this, we can see BINet has successfully learned the operator from boundary geometry to solution.
6 Conclusion
We have developed a new neural network method called BINet to solve PDEs. In BINet, the solution of PDE is represented by boundary integral composed of an explicit kernel and an unknown density which is approximated by a neural network. Then the PDE is solved by learning the boundary integral representation to fit the boundary condition. Since the loss function measures only the misfit between the integral representation and the boundary condition, BINet has less hyperparameters and lower sampling dimensions than many other neural networkbased PDE solvers. Because the boundary integral satisfies PDE automatically in the interior and exterior of the boundary, BINet can solve bounded and unbounded PDEs. Furthermore, BINet can learn operators from PDE parameters including coefficients and boundary geometry to solutions. Besides, using the NTK technique, we prove that BINet converges as the width of the network goes to infinity. We test BINet with the Laplace equation and Helmholtz equation in extensive settings. The numerical experiments show that BINet works effectively for many cases such as interior problems, exterior problems, high wavenumber problems. The experiments also illustrate the capability of BINet in learning solution operators. All the experiments verify the advantages of BINet numerically. Although our method exhibits competitive performance against the PINN method and DeepRitz method in many situations, the requirement of highprecision boundary integration limits further applications in higherdimensional problems. This will be the direction of improving BINet in the future.
References
 [1] David J Griffiths. Introduction to electrodynamics, 2005.
 [2] Roger Temam. NavierStokes equations: theory and numerical analysis, volume 343. American Mathematical Soc., 2001.
 [3] Erwin Schrödinger. An undulatory theory of the mechanics of atoms and molecules. Physical review, 28(6):1049, 1926.
 [4] James D MacBeth and Larry J Merville. An empirical examination of the blackscholes call option pricing model. The journal of finance, 34(5):1173–1186, 1979.
 [5] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
 [6] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
 [7] E Weinan and Bing Yu. The deep ritz method: a deep learningbased numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
 [8] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physicsinformed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
 [9] Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective. arXiv preprint arXiv:2007.14527, 2020.
 [10] Quanhui Zhu and Jiang Yang. A local deep learning method for solving high order partial differential equations. arXiv preprint arXiv:2103.08915, 2021.
 [11] Oliver Dimon Kellogg. Foundations of potential theory, volume 31. Courier Corporation, 1953.
 [12] Arthur Jacot, Franck Gabriel, and Clément Hongler. Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572, 2018.
 [13] MWMG Dissanayake and Nhan PhanThien. Neuralnetworkbased approximations for solving partial differential equations. communications in Numerical Methods in Engineering, 10(3):195–201, 1994.
 [14] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
 [15] Isaac E Lagaris, Aristidis C Likas, and Dimitris G Papageorgiou. Neuralnetwork methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
 [16] Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for highdimensional partial differential equations. Journal of Computational Physics, 411:109409, 2020.

[17]
Zhiqiang Cai, Jingshuang Chen, Min Liu, and Xinyu Liu.
Deep leastsquares methods: An unsupervised learningbased numerical method for solving elliptic pdes.
Journal of Computational Physics, 420:109707, 2020.  [18] Liyao Lyu, Zhen Zhang, Minxin Chen, and Jingrun Chen. Mim: A deep mixed residual method for solving highorder partial differential equations. arXiv preprint arXiv:2006.04146, 2020.
 [19] Jens Berg and Kaj Nyström. A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing, 317:28–41, 2018.
 [20] Wei Cai and ZhiQin John Xu. Multiscale deep neural networks for solving high dimensional pdes. arXiv preprint arXiv:1910.11710, 2019.
 [21] Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33, 2020.
 [22] Craig R Gin, Daniel E Shea, Steven L Brunton, and J Nathan Kutz. Deepgreen: Deep learning of green’s functions for nonlinear boundary value problems. arXiv preprint arXiv:2101.07206, 2020.
 [23] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
 [24] Shengze Cai, Zhicheng Wang, Lu Lu, Tamer A Zaki, and George Em Karniadakis. Deepm&mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Computational Physics, 436:110296, 2021.
 [25] Han Gao, Luning Sun, and JianXun Wang. Phygeonet: Physicsinformed geometryadaptive convolutional neural networks for solving parametric pdes on irregular domain. arXiv preprint arXiv:2004.13145, 2020.
 [26] Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
 [27] Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving parametric pde problems with artificial neural networks. arXiv preprint arXiv:1707.03351, 2017.
 [28] Yibo Yang and Paris Perdikaris. Physicsinformed deep generative models. arXiv preprint arXiv:1812.03511, 2018.
 [29] Yibo Yang and Paris Perdikaris. Adversarial uncertainty quantification in physicsinformed neural networks. Journal of Computational Physics, 394:136–152, 2019.
 [30] Bradley K Alpert. Hybrid gausstrapezoidal quadrature rules. SIAM Journal on Scientific Computing, 20(5):1551–1584, 1999.
 [31] Sharad Kapur and Vladimir Rokhlin. Highorder corrected trapezoidal quadrature rules for singular functions. SIAM Journal on Numerical Analysis, 34(4):1331–1356, 1997.
 [32] Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning, pages 1675–1685. PMLR, 2019.
 [33] Simon S Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Gradient descent provably optimizes overparameterized neural networks. arXiv preprint arXiv:1810.02054, 2018.
 [34] Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. arXiv preprint arXiv:1808.01204, 2018.
 [35] Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, and Ruosong Wang. On exact computation with an infinitely wide neural net. arXiv preprint arXiv:1904.11955, 2019.
 [36] Wenjie Gao. Layer potentials and boundary value problems for elliptic systems in lipschitz domains. Journal of Functional Analysis, 95(2):377–399, 1991.
 [37] Gregory Verchota. Layer potentials and regularity for the dirichlet problem for laplace’s equation in lipschitz domains. Journal of functional analysis, 59(3):572–611, 1984.
 [38] George C Hsiao and Wolfgang L Wendland. Boundary integral equations. Springer, 2008.
 [39] Lexing Ying. Fast algorithms for boundary integral equations. Multiscale Modeling and Simulation in Science, pages 139–193, 2009.
 [40] Zeyuan AllenZhu, Yuanzhi Li, and Yingyu Liang. Learning and generalization in overparameterized neural networks, going beyond two layers. arXiv preprint arXiv:1811.04918, 2018.
 [41] Jiaoyang Huang and HorngTzer Yau. Dynamics of deep neural networks and neural tangent hierarchy. In International Conference on Machine Learning, pages 4542–4551. PMLR, 2020.
 [42] Zeyuan AllenZhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via overparameterization. In International Conference on Machine Learning, pages 242–252. PMLR, 2019.
 [43] Yann A LeCun, Léon Bottou, Genevieve B Orr, and KlausRobert Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.

[44]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification.
In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
Appendix
Appendix A A review of the PINN and Deep Ritz method
a.1 PINN method
To solve the linear PDE
(A.1)  
the main idea of the PINN[8] method is to use a neural network as an ansatz to approximate the solution , where represents the trainable parameters in the neural network. There was other work of the similar idea such as [19, 15, 6]. Then we can use the automatic differentiation tool to calculate the derivative and define the loss function
For the boundary conditions, we can define the loss function
Finally, we can combine the loss function and with a hyperparameter to get loss function,
By minimizing the loss function , PINN will get the approximation solution of the PDE (A.1).
a.2 Deep Ritz method
For the specific PDE problems in equation (A.1), we can change the equation into a Ritz variational form. This is the main idea of the Deep Ritz method[7]. For instance, if we consider a Laplace equation,
(A.2)  
we can solve the equation equivalently by minimizing the following Ritz variational problem
(A.3) 
We also use a neural network to approximate the solution of the PDE, and we can use the automatic differentiation tool to calculate the gradient of the neural network. So the variation (A.3) can naturally be used as a loss function, defined as
For the boundary condition, the loss function also can be defined as
Finally, the loss funtion can be defined as
(A.4) 
where is also the hyperparameter. By minimize the loss function , Deep Ritz method will get the solution of the PDE (A.2).
This paper introduce a residual network as the anastz to approximate the solution. The residual network is also used in our work to approximate the density function in the boundary integral form. The architecture of the residual network is shown in Figure 7.
Appendix B The fundamental solution of different equations
In this section, we make some supplementary introductions to the fundamental solution. Before defining the fundamental solution, we first introduce the function,
Definition
A function called ndimensional function if , and for all functions that is continuous at we have
Then, for the PDE
(B.1) 
we can define the corresponding fundamental solution of the equations (B.1).
Definition
A function is called the fundamental solution corresponding to equations (B.1) if is symmetric about and and satisfy
where and is the differential operator which acts on component .
Limited by the length of the article, although we only introduce the fundamental solutions of Laplace equations and Helmholtz equations in twodimensional cases in detail, in general, BINet can be applied as long as the fundamental solution of in can be obtained. Let’s give a few more examples. More details can be found in [38, 11, 39].
b.1 The Laplace Equations
If we consider a Laplace equation
(B.2) 
the fundamental solution has the following form
where is the dimension of the equation and is the volume of the ndimensional unit sphere. Then the fundamental solution satisfies
b.2 The Helmholtz Equations
The Helmholtz equation has the following form
(B.3) 
where is a real number. The fundamental solution of the Helmholtz equation has the following form
where is the dimension of the equation and is also the volume of the ndimensional unit sphere. Then the fundamental solution satisfies
b.3 The Navier’s Equations
We consider Navier’s equations (also called Lam system). These are famous equations in linear elasticity for isotropic materials, and the governing equations are
(B.4) 
where are the Lam constants of the elastic material, and
is the displacement vector. The fundamental solution of the equation (
B.4) is(B.5) 
It means that G(x,y) defined by the (B.5) satisfies the following equation,
(B.6) 
where
is the norder identity matrix.
b.4 The Stokes Equations
Stokes equations are well known in the incompressible viscous fluid model. The general form of the Stokes equations is
(B.7)  
where and are the velocity and pressure of the fluid flow, respectively, and and are the given dynamic viscosity of the fluid and forcing term, respectively.
b.5 The Biharmonic Equation
The Biharmonic Equation is a single scalar 4thorder equation, which can be reduced from plane elasticity and plane Stokes flow. We consider a two dimensional Biharmonic equation,
(B.11) 
The fundamental solution of the equation (B.11) is
(B.12) 
where satisfies
Appendix C The convergence analysis of BINet
c.1 The structure for solving PDEs using neural networks
BINet consists of a neural network such as MLP and an integral operator performed on the output of the neural network. Thus, the output of BINet reads
where is the neural network approximating the density function in the boundary integral form, is the dimensional variable. The operator is performed on the output of the neural network which completes the whole architecture. And the loss function is
with label function .
For a more general setup, the operator has different forms. For PINN/DGM method, the operator is directly the partial differential operator, implying
where is the approximation of the solution. The DeepRitz method for solving the Laplace equation is to minimize the optimization problem where part of the loss reads
It follows that the corresponding operator has the following form
Therefore in the view of the operator applied on the neural network, different from the integral type operator of BINet, PINN and Deep Ritz methods have extra differential operators although the Deep Ritz method decreases the order from the second to the first.
Definition
The operator is admissible if the following conditions hold:

(linear property);

(commutative property);

(parameter variant).
It is easy to check that the operators of PINN, DeepRitz, and our BINet all satisfy the admissible property. And the admissibility is crucial in the following proof.
The different design of the neural network and the operator makes the network different. Here, we adopt the typical settings of the neural network as an MLP. As the integral operator is bounded, thus the convergence results can be obtained in our BINet and the proof is shown in Appendix 3.3. Here the structure of the neural network is introduced first for the derivation of the NTK form.
The hidden layer MLP is defined as
input layer:  (C.1)  
hidden layer:  (C.2)  
output layer:  (C.3) 
where is the hidden layer, is the trainable parameters which is the standard representation for the weights , is the width of the th layer and .
c.2 The dynamic neural tangent kernel
We have chosen the MLP as the neural network in the analysis for simplicity. A similar analysis can also be done for other structures as the convergence results of such neural networks are reported in the literature [40, 41]. Applying the integral operator on the neural networks should also give similar convergence results. Thus different schemes here imply different forms of the operator , see Appendix C.1 for detail.
The training process of the neural ODE is basically to minimize the loss by the method based on the gradient. One typical scheme is the gradient descent method which has the form
(C.4) 
When the learning rate , we have the limiting gradient flow
(C.5) 
where is the continuous version of index of the learning steps in the training process. More precisely, for the weight matrix , we have the evolution
Hence the evolution of the prediction satisfies the following form
(C.6) 
where is the vector of loss, and denotes the index of the learning parameter. We denote the dynamic Neural Tangent Kernel (DNTK) for the PDEbased neural network as
(C.7) 
where is defined as the summation over each component index of .
Next, we would give the explicit form of the DNTK for further analysis. Recall that the output of the MLP in PDEbased neural network has the following form
(C.8) 
where we have omitted the explicit dependence of in the formula for simplicity. And thus
(C.9) 
where
(C.10) 
To give the form of the DNTK, the key is to give the form of . From above forms, we can obtain
(C.11) 
where we have used the denotation
(C.12) 
By defining and we obtain
(C.13) 
where satisfies the induction relation .
With the admissible property of in the sense of Definition C.1, we have the DNTK as
(C.14) 
where is the dynamic neural tangent kernel of the MLP with the following form [42]
(C.15) 
and is a function given by the kernel operated by on its head and the tail for performing with respect to the former variable and latter variable .
Denote the constant neural tangent kernel in [35] as
(C.16) 
where is given by a reduction form
(C.17)  
(C.18)  
Comments
There are no comments yet.