1. Introduction
Recently there is a surge of interest in solving partial differential equations by deep learningbased numerical methods
[5, 11, 6, 9, 15, 2]. However, there are few attempts to deal with the boundary conditions, particularly the essential boundary conditions, which is the main objective of the present work. These methods allow for the compositional construction of new approximation spaces from various neural networks. Such constructions are usually free of a mesh so that they are in essence meshless methods. The shape functions in the approximation spaces are in general non interpolatory, which makes the implementation of the essential boundary conditions not an easy task. Many approaches toward dealing with the essential boundary conditions have been proposed over the past fifty years; we refer to [1, §7.2] for a review.In fact, an efficient method for imposing the essential boundary conditions has been proposed by Nitsche in the early 1970s [14]. This method was revived to deal with the elliptic interface problems and the unfitted mesh problems; we refer to [4] for a review of the progress in this direction. In the context of the meshless method, Nitsche’s idea has been proved to be an efficient approach to deal with boundary conditions in the framework of partition of unity method [8] as well as generalized finite element method [1]. In this work, we incorporate the idea of Nitsche to deal with the essential boundary conditions in the framework of Deep Ritz Method [6]. We call this new algorithm the Deep Nitsche Method. In the next part, we introduce the energy formulation of the method, and then we present the numerical results for solving some mixed boundary value problems with regular and singular solutions in two dimensions and also for problems in high dimensions up to . In particular, we compare Deep Nitsche Method with Deep Ritz Method and another method based on leastsquares variational formulation in all the examples. Finally we conclude with some remarks.
2. Deep Nitsche Method
We consider the mixed boundary value problem posed on :
(2.1) 
where is a symmetrical matrix satisfying
The boundary and .
The energy functional associated with the above boundary value problem in the sense of Nitsche [14] is
(2.2)  
where is a parameter to be determined later on. We minimize over certain trial space that will defined later on. The resulting optimization problem is solved by the Stochastic Gradient Decent (SGD) method [7, §8].
(2.3) 
The associated variational problem is: find such that
(2.4) 
where the bilinear form and the linear functional are defined by
for all , and
respectively.
To prove the wellposedness of (2.4), we need prove the coercivity of , and the boundedness of and . We make the following Assumption: There exists a constant such that for all ,
(2.5) 
Define a norm
By Friedrich’s inequality, is equivalent to the standard Hnorm.
Lemma 2.1.
Proof.
The boundedness of and may be proved in a similar way. The wellposedness of the variational problem boils down to the inverse assumption (2.5), which is natural for the case when consists of polynomials, i.e.,
where is the mesh size of the triangulation. This means that we need to take for certain large , such choice is standard in finite element literature. However, we do not know whether such inverse inequality is true for functions in that are constructed from various neural networks in compositional manner. The only exception is the function constructed from the Gaussian networks [13], which does not seem apply to the present case because it employ the distance between the centers as a measure of discretization.
The trial functions space is modeled by ResNet [10]. The component of ResNet is shown in Figure 1. The input layer is a fully connected layer with hidden nodes, which maps from to . Assume that
is a scalar activation function and let
be the tensor product of
as , thenwhere . The hidden layers is constructed by
residual blocks. Each block contains two fully connected layers and one residual connection layer. The
th block takes the formwhere and .
The output layer is a fully connected layer with one hidden node. The approximation solution may be expressed as
where and , the parameter set is defined as
In each step of the SGD iteration, we randomly sample points , points and points
. The loss function is defined as
3. Numerical Experiments
We apply the Deep Nitsche Method to solve the mixed boundary value problem (2.1), and compare it with two known methods such as Deep Ritz Method in [6] and a deep learningbased leastsquares method [15], which at least dates back to [3] in finite element method. The energy functional associated with boundary value problem (2.1) for Deep Ritz Method reads as
and the energy functional associated with the Leastsquares is
In all the examples, we let be the unit hypercube as and the activation function .
3.1. Two dimensional examples
The solution is approximated by a neural network with five residual blocks and ten hidden nodes per fully connected layer. Noticing that one residual block contains two fully connected layers and one residual connection, the number of trainable parameters is . An Adam optimizer is employed to train with the learning rate [12]. We train the model for epochs. In each epoch, we randomly sample points inside the domain and points on each edge of .
In the first example, we solve the Laplace equation with a smooth solution
The boundary and and we compute , and by (2.1). We report the relative errors
in Table 1 for three methods with different penalized parameters .
Deep Nitsche Method  

50  3.948e2  6.668e2  2.111e1 
500  6.492e2  6.782e2  1.585e1 
5000  2.029e2  4.135e2  1.569e1 
50000  1.270e1  2.041e1  4.592e1 
Deep Ritz Method  
50  2.732e2  4.822e2  1.136e1 
500  1.068e2  3.069e2  1.143e1 
5000  5.154e2  1.008e1  2.933e1 
50000  9.642e1  9.663e1  9.489e1 
LeastSquares Method  
50  5.086e3  2.995e3  4.570e3 
500  4.160e3  2.564e3  6.857e3 
5000  5.787e3  9.322e3  2.965e2 
50000  4.946e2  5.909e2  1.174e1 
It seems all three methods give comparable accuracy, and the difference between the errors with parameter varying from to are negligible, while with very big value seems a bad choice. It is worthwhile to mention that seems the best for the Deep Nitsche method, while seems the best for the Deep Ritz method and the leastsquares method.
In the second example, we consider
where is an Lshaped domain. This problem admits an analytical solution , which belongs to with . In fact, such solution usually stands for the singular part of the general situation [16]. We report the errors in Table 2. We have not computed the error in H norm because is obviously unbounded.
Deep Nitsche Method  

50  7.070e3  8.211e2 
500  7.570e3  9.107e2 
5000  1.730e2  1.885e1 
50000  4.959e2  2.993e1 
Deep Ritz Method  
50  1.055e2  7.565e2 
500  9.487e3  1.124e1 
5000  1.982e2  2.011e1 
50000  6.542e2  4.783e1 
LeastSquares Method  
50  1.530e2  1.170e1 
500  6.755e3  1.015e1 
5000  2.145e2  1.241e1 
50000  7.434e2  1.951e1 
In view of Table 2, it seems that all three methods are robust with respect to the parameter , and Deep Nitsche method seems the most accurate method for approximating the singular solution.
In the last example in two dimensions, we test problem with a nonconstant coefficient matrix . The set up is the same with the previous example. Let
and
where and . We calculate , and according to (2.1). The relative errors , and are reported in Table 3.
Deep Nitsche Method  

50  3.940e1  7.923e1  1.156e0 
500  6.121e2  7.416e2  2.011e1 
5000  2.441e2  6.358e2  1.768e1 
50000  6.902e2  1.463e1  3.587e1 
Deep Ritz Method  
50  8.901e2  1.063e1  2.068e1 
500  3.945e2  6.079e2  1.737e1 
5000  4.272e2  1.106e1  4.919e1 
50000  1.951e1  3.235e1  5.667e1 
LeastSquares Method  
50  2.593e2  5.821e2  7.888e2 
500  1.345e2  3.282e2  5.881e2 
5000  3.658e2  3.564e2  5.888e2 
50000  8.185e2  5.413e2  7.555e2 
In view of Table 3, it seems all methods are quite robust with respect to the penalized parameter, and the leastsquares gives the best results even in case of large .
3.2. High dimensional examples
We turn to problems in high dimensions in this part. We still employ an Adam optimizer with the learning rate 0.001 and train the model for 50000 epochs. In each epoch, we randomly sample 512 points in and 16 points on each face of .
In the first example, we consider a less smooth solution in dimensions
with a pure Dirichlet boundary condition. We calculate and by (2.1). This example has been test in [8] with a particlepartition of unit method. We approximate the solution by a neural network with five residual blocks and 50 hidden nodes per fully connected layer. Thus the number of trainable parameters is 26601. We report the relative errors in Table 4.
Deep Nitsche Method  

50  1.962e2  6.887e2  2.186e1 
500  2.517e2  8.051e2  5.411e1 
5000  1.584e2  6.112e2  4.659e1 
50000  2.098e2  7.653e2  3.837e1 
Deep Ritz Method  
50  2.102e2  5.902e2  3.962e1 
500  1.215e2  4.788e2  3.831e1 
5000  1.351e2  5.612e2  1.553e1 
50000  1.102e2  4.650e2  4.424e1 
LeastSquares Method  
50  4.486e2  7.156e2  1.861e1 
500  1.434e2  5.442e2  1.921e1 
5000  2.049e2  5.484e2  2.103e1 
50000  1.242e2  4.704e2  2.333e1 
In view of Table 4, we observe that all three methods give comparable results and all three methods are even more robust with respect to the penalized parameter . The same conclusion is valid for the next example for higher dimensions.
Deep Nitsche Method  

50  8.611e4  5.014e3  2.075e2 
500  7.795e4  4.466e3  1.826e2 
5000  3.224e3  5.977e3  2.098e2 
50000  2.291e3  5.368e3  2.023e2 
Deep Ritz Method  
50  1.590e3  4.800e3  1.869e2 
500  7.028e3  8.984e3  2.355e2 
5000  7.992e4  4.609e3  2.126e2 
50000  2.291e3  5.368e3  2.023e2 
LeastSquares Method  
50  5.689e3  7.716e3  2.374e2 
500  8.759e4  4.845e3  2.235e2 
5000  1.118e3  3.915e3  1.662e2 
50000  1.121e3  6.444e3  2.740e2 
In the second example, we consider a smooth solution in 100 dimensions.
with a pure Dirichlet boundary condition. We compute and by (2.1). The exact solution is approximated by a neural network with five residual blocks and hidden nodes per fully connected layer, and the number of trainable parameters is . We report the relative errors in Table 5. It shows that our method has potential to work for rather high dimensions.
4. Conclusion
Based on Nitsche’s idea and representing the trial functions by deep neural network, we propose a new method to deal with the complicated boundary conditions. The test examples show that the method has the following advantages:

It deals with the mixed boundary conditions in a variational way without significant extra costs and it fits well with the stochastic gradient descent method.

It works on the problems in low dimensions as well as high dimensions. It also has potential to work for problem in rather high dimensions.

The method is less sensitive to the penalized parameter, by contrast to that for the traditional trial space [8]. This is more pronounced in high dimensions.
We also systematically compare Deep Nitsche Method with Deep Ritz Method and leastsquares method for regular as well as singular solution, for low dimensional problems as well as high dimensional problems. It seems that the new method is comparable to these two methods, while it is slightly more accurate for singular solution. There are still several issues we have not addressed such as the influence of the network structures and a systematical method to improve the accuracy, which will be pursued in our future work.
References
 [1] I. Babuška, U. Banerjee, and J.E. Osborn, Surveys of meshless and generalized finite element method: a unified approach, Acta Numer. 12 (2003), 1–125.
 [2] J. Berg and K. Nystrom, A unified deep artificial neural network approach for partial differential equations in complex geometries, Neural Computing 317 (2018), 28–41.
 [3] J.H. Bramble and A. Schatz, RayleigRitzGalerkin methods for Dirichlet’s problem using subspaces without boundary conditions, Comm. Pure Appl. Math. XXIII (1970), 353–675.
 [4] E. Burman and P. Zunino, Numericak approximation of large contrast problems with the unfitted Nitsche method, Frontiers in Numerical AnalysisDurham 2010, J. Blowey and M. Jensen Eds., SpringerVerlag Berlin Heidelberg, 2012, pp. 227–281.
 [5] W. E, J.Q. Han, and A. Jentzen, Deep learningbased numerical methods for highdimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat. 5 (2017), 349–380.
 [6] W. E and B. Yu, The Deep Ritz Method: a deeplearning based numerical algorithm for solving variational problems, Commun. Math. Stat. 6 (2018), 1–12.
 [7] Y. Goodfellow, I. Bengio and A. Courville, Deep Learning, MIT Press, Cambridge, 2016.
 [8] M. Griebel and M.A. Schweitzer, A particlepartition of unit method part V: boundary conditions, Geometric Analysis and Nonlinear Partial Differential Equations, S. Hildebrandt eds., SpringerVerlag Berlin Heidelberg, 2003, pp. 519–542.
 [9] J.Q Han, A. Jentzen, and W. E, Solving highdimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. 115 (2018), no. 34, 8505–8510.

[10]
K.M. He, X.Y. Zhang, S.Q. Ren, and J. Sun, Deep residual learning for
image reconganition
, In: IEEE Conference on Computer Vision and Pattern Recongnition (CVPR) (2016), 770–778.
 [11] J. Khoo, J. Lu, and L. Ying, Solving parametric pde problems with artificial neural networks, 2017, arXiv: 1707.03351.
 [12] D.P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980; Published as a conference paper at ICLR 2015.
 [13] H.N. Mhaskar, When is approximation by Gaussian networks necessarily a linear process?, Neural Networks 17 (2004), 989–1001.
 [14] J. Nitsche, Über ein Variationsprinzip zur Lösung von DrichletProblemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind, Abh. Math. Sem. Univ. Hamburg 36 (1971), 9–15.
 [15] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys. 375 (2018), 1339–1364.
 [16] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, PrenticeHall, Inc., Englewood Cliffs, N. J., 1973.
Comments
There are no comments yet.