Deep Nitsche Method: Deep Ritz Method with Essential Boundary Conditions

by   Yulei Liao, et al.
Chinese Academy of Science

We propose a method due to Nitsche (Deep Nitsche Method) from 1970s to deal with the essential boundary conditions encountered in the deep learning-based numerical method without significant extra computational costs. The method inherits several advantages from Deep Ritz Method <cit.> while successfully overcomes the difficulties in treatment of the essential boundary conditions. We illustrate the method on several representative problems posed in at most 100 dimensions with complicated boundary conditions. The numerical results clearly show that the Deep Nitsche Method is naturally nonlinear, naturally adaptive and has the potential to work on rather high dimensions.



There are no comments yet.


page 1

page 2

page 3

page 4


A class of boundary conditions for time-discrete Green-Naghdi equations with bathymetry

This work is devoted to the structure of the time-discrete Green-Naghdi ...

Computational homogenization of time-harmonic Maxwell's equations

In this paper we consider a numerical homogenization technique for curl-...

An augmented Lagrangian deep learning method for variational problems with essential boundary conditions

This paper is concerned with a novel deep learning method for variationa...

Uniform Convergence Guarantees for the Deep Ritz Method for Nonlinear Problems

We provide convergence guarantees for the Deep Ritz Method for abstract ...

Learning Off-By-One Mistakes: An Empirical Study

Mistakes in binary conditions are a source of error in many software sys...

Reflectionless propagation of Manakov solitons on a line:A model based on the concept of transparent boundary conditions

We consider the problem of absence of backscattering in the transport of...

Cut Bogner-Fox-Schmit Elements for Plates

We present and analyze a method for thin plates based on cut Bogner-Fox-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recently there is a surge of interest in solving partial differential equations by deep learning-based numerical methods 

[5, 11, 6, 9, 15, 2]. However, there are few attempts to deal with the boundary conditions, particularly the essential boundary conditions, which is the main objective of the present work. These methods allow for the compositional construction of new approximation spaces from various neural networks. Such constructions are usually free of a mesh so that they are in essence meshless methods. The shape functions in the approximation spaces are in general non interpolatory, which makes the implementation of the essential boundary conditions not an easy task. Many approaches toward dealing with the essential boundary conditions have been proposed over the past fifty years; we refer to [1, §7.2] for a review.

In fact, an efficient method for imposing the essential boundary conditions has been proposed by Nitsche in the early 1970s [14]. This method was revived to deal with the elliptic interface problems and the unfitted mesh problems; we refer to [4] for a review of the progress in this direction. In the context of the meshless method, Nitsche’s idea has been proved to be an efficient approach to deal with boundary conditions in the framework of partition of unity method [8] as well as generalized finite element method [1]. In this work, we incorporate the idea of Nitsche to deal with the essential boundary conditions in the framework of Deep Ritz Method [6]. We call this new algorithm the Deep Nitsche Method. In the next part, we introduce the energy formulation of the method, and then we present the numerical results for solving some mixed boundary value problems with regular and singular solutions in two dimensions and also for problems in high dimensions up to . In particular, we compare Deep Nitsche Method with Deep Ritz Method and another method based on least-squares variational formulation in all the examples. Finally we conclude with some remarks.

2. Deep Nitsche Method

We consider the mixed boundary value problem posed on :


where is a symmetrical matrix satisfying

The boundary and .

The energy functional associated with the above boundary value problem in the sense of Nitsche [14] is


where is a parameter to be determined later on. We minimize over certain trial space that will defined later on. The resulting optimization problem is solved by the Stochastic Gradient Decent (SGD) method [7, §8].


The associated variational problem is: find such that


where the bilinear form and the linear functional are defined by

for all , and


To prove the well-posedness of (2.4), we need prove the coercivity of , and the boundedness of and . We make the following Assumption: There exists a constant such that for all ,


Define a norm

By Friedrich’s inequality, is equivalent to the standard H-norm.

Lemma 2.1.

Let Assumption 2.5 is valid and if , then


where .


For any ,

Using Cauchy-Schwartz inequality and the assumption (2.5), we obtain

Combining the above two equations, we obtain (2.6). ∎

The boundedness of and may be proved in a similar way. The well-posedness of the variational problem boils down to the inverse assumption (2.5), which is natural for the case when consists of polynomials, i.e.,

where is the mesh size of the triangulation. This means that we need to take for certain large , such choice is standard in finite element literature. However, we do not know whether such inverse inequality is true for functions in that are constructed from various neural networks in compositional manner. The only exception is the function constructed from the Gaussian networks [13], which does not seem apply to the present case because it employ the distance between the centers as a measure of discretization.

fully connected layer(size )+activation

fully connected layer(size )+activation

fully connected layer(size )+activation



fully connected layer(size )+activation

fully connected layer(size )+activation


fully connected layer(size )

Figure 1. The component of ResNet

The trial functions space is modeled by ResNet [10]. The component of ResNet is shown in Figure 1. The input layer is a fully connected layer with hidden nodes, which maps from to . Assume that

is a scalar activation function and let

be the tensor product of

as , then

where . The hidden layers is constructed by

residual blocks. Each block contains two fully connected layers and one residual connection layer. The

th block takes the form

where and .

The output layer is a fully connected layer with one hidden node. The approximation solution may be expressed as

where and , the parameter set is defined as

In each step of the SGD iteration, we randomly sample points , points and points

. The loss function is defined as

3. Numerical Experiments

We apply the Deep Nitsche Method to solve the mixed boundary value problem (2.1), and compare it with two known methods such as Deep Ritz Method in [6] and a deep learning-based least-squares method [15], which at least dates back to [3] in finite element method. The energy functional associated with boundary value problem (2.1) for Deep Ritz Method reads as

and the energy functional associated with the Least-squares is

In all the examples, we let be the unit hypercube as and the activation function .

3.1. Two dimensional examples

The solution is approximated by a neural network with five residual blocks and ten hidden nodes per fully connected layer. Noticing that one residual block contains two fully connected layers and one residual connection, the number of trainable parameters is . An Adam optimizer is employed to train with the learning rate  [12]. We train the model for epochs. In each epoch, we randomly sample points inside the domain and points on each edge of .

In the first example, we solve the Laplace equation with a smooth solution

The boundary and and we compute , and by (2.1). We report the relative errors

in Table 1 for three methods with different penalized parameters .

Deep Nitsche Method
50 3.948e-2 6.668e-2 2.111e-1
500 6.492e-2 6.782e-2 1.585e-1
5000 2.029e-2 4.135e-2 1.569e-1
50000 1.270e-1 2.041e-1 4.592e-1
Deep Ritz Method
50 2.732e-2 4.822e-2 1.136e-1
500 1.068e-2 3.069e-2 1.143e-1
5000 5.154e-2 1.008e-1 2.933e-1
50000 9.642e-1 9.663e-1 9.489e-1
Least-Squares Method
50 5.086e-3 2.995e-3 4.570e-3
500 4.160e-3 2.564e-3 6.857e-3
5000 5.787e-3 9.322e-3 2.965e-2
50000 4.946e-2 5.909e-2 1.174e-1
Table 1. The smooth solution in

It seems all three methods give comparable accuracy, and the difference between the errors with parameter varying from to are negligible, while with very big value seems a bad choice. It is worthwhile to mention that seems the best for the Deep Nitsche method, while seems the best for the Deep Ritz method and the least-squares method.

In the second example, we consider

where is an L-shaped domain. This problem admits an analytical solution , which belongs to with . In fact, such solution usually stands for the singular part of the general situation [16]. We report the errors in Table 2. We have not computed the error in H norm because is obviously unbounded.

Deep Nitsche Method
50 7.070e-3 8.211e-2
500 7.570e-3 9.107e-2
5000 1.730e-2 1.885e-1
50000 4.959e-2 2.993e-1
Deep Ritz Method
50 1.055e-2 7.565e-2
500 9.487e-3 1.124e-1
5000 1.982e-2 2.011e-1
50000 6.542e-2 4.783e-1
Least-Squares Method
50 1.530e-2 1.170e-1
500 6.755e-3 1.015e-1
5000 2.145e-2 1.241e-1
50000 7.434e-2 1.951e-1
Table 2. The singular solution in domain

In view of Table 2, it seems that all three methods are robust with respect to the parameter , and Deep Nitsche method seems the most accurate method for approximating the singular solution.

In the last example in two dimensions, we test problem with a nonconstant coefficient matrix . The set up is the same with the previous example. Let


where and . We calculate , and according to (2.1). The relative errors , and are reported in Table 3.

Deep Nitsche Method
50 3.940e-1 7.923e-1 1.156e0
500 6.121e-2 7.416e-2 2.011e-1
5000 2.441e-2 6.358e-2 1.768e-1
50000 6.902e-2 1.463e-1 3.587e-1
Deep Ritz Method
50 8.901e-2 1.063e-1 2.068e-1
500 3.945e-2 6.079e-2 1.737e-1
5000 4.272e-2 1.106e-1 4.919e-1
50000 1.951e-1 3.235e-1 5.667e-1
Least-Squares Method
50 2.593e-2 5.821e-2 7.888e-2
500 1.345e-2 3.282e-2 5.881e-2
5000 3.658e-2 3.564e-2 5.888e-2
50000 8.185e-2 5.413e-2 7.555e-2
Table 3. Results for problem with coefficient in

In view of Table 3, it seems all methods are quite robust with respect to the penalized parameter, and the least-squares gives the best results even in case of large .

3.2. High dimensional examples

We turn to problems in high dimensions in this part. We still employ an Adam optimizer with the learning rate 0.001 and train the model for 50000 epochs. In each epoch, we randomly sample 512 points in and 16 points on each face of .

In the first example, we consider a less smooth solution in dimensions

with a pure Dirichlet boundary condition. We calculate and by (2.1). This example has been test in [8] with a particle-partition of unit method. We approximate the solution by a neural network with five residual blocks and 50 hidden nodes per fully connected layer. Thus the number of trainable parameters is 26601. We report the relative errors in Table 4.

Deep Nitsche Method
50 1.962e-2 6.887e-2 2.186e-1
500 2.517e-2 8.051e-2 5.411e-1
5000 1.584e-2 6.112e-2 4.659e-1
50000 2.098e-2 7.653e-2 3.837e-1
Deep Ritz Method
50 2.102e-2 5.902e-2 3.962e-1
500 1.215e-2 4.788e-2 3.831e-1
5000 1.351e-2 5.612e-2 1.553e-1
50000 1.102e-2 4.650e-2 4.424e-1
Least-Squares Method
50 4.486e-2 7.156e-2 1.861e-1
500 1.434e-2 5.442e-2 1.921e-1
5000 2.049e-2 5.484e-2 2.103e-1
50000 1.242e-2 4.704e-2 2.333e-1
Table 4. Less smooth solution in dimensions

In view of Table 4, we observe that all three methods give comparable results and all three methods are even more robust with respect to the penalized parameter . The same conclusion is valid for the next example for higher dimensions.

Deep Nitsche Method
50 8.611e-4 5.014e-3 2.075e-2
500 7.795e-4 4.466e-3 1.826e-2
5000 3.224e-3 5.977e-3 2.098e-2
50000 2.291e-3 5.368e-3 2.023e-2
Deep Ritz Method
50 1.590e-3 4.800e-3 1.869e-2
500 7.028e-3 8.984e-3 2.355e-2
5000 7.992e-4 4.609e-3 2.126e-2
50000 2.291e-3 5.368e-3 2.023e-2
Least-Squares Method
50 5.689e-3 7.716e-3 2.374e-2
500 8.759e-4 4.845e-3 2.235e-2
5000 1.118e-3 3.915e-3 1.662e-2
50000 1.121e-3 6.444e-3 2.740e-2
Table 5. Smooth solution in dimensions

In the second example, we consider a smooth solution in 100 dimensions.

with a pure Dirichlet boundary condition. We compute and by (2.1). The exact solution is approximated by a neural network with five residual blocks and hidden nodes per fully connected layer, and the number of trainable parameters is . We report the relative errors in Table 5. It shows that our method has potential to work for rather high dimensions.

4. Conclusion

Based on Nitsche’s idea and representing the trial functions by deep neural network, we propose a new method to deal with the complicated boundary conditions. The test examples show that the method has the following advantages:

  1. It deals with the mixed boundary conditions in a variational way without significant extra costs and it fits well with the stochastic gradient descent method.

  2. It works on the problems in low dimensions as well as high dimensions. It also has potential to work for problem in rather high dimensions.

  3. The method is less sensitive to the penalized parameter, by contrast to that for the traditional trial space [8]. This is more pronounced in high dimensions.

We also systematically compare Deep Nitsche Method with Deep Ritz Method and least-squares method for regular as well as singular solution, for low dimensional problems as well as high dimensional problems. It seems that the new method is comparable to these two methods, while it is slightly more accurate for singular solution. There are still several issues we have not addressed such as the influence of the network structures and a systematical method to improve the accuracy, which will be pursued in our future work.


  • [1] I. Babuška, U. Banerjee, and J.E. Osborn, Surveys of meshless and generalized finite element method: a unified approach, Acta Numer. 12 (2003), 1–125.
  • [2] J. Berg and K. Nystrom, A unified deep artificial neural network approach for partial differential equations in complex geometries, Neural Computing 317 (2018), 28–41.
  • [3] J.H. Bramble and A. Schatz, Rayleig-Ritz-Galerkin methods for Dirichlet’s problem using subspaces without boundary conditions, Comm. Pure Appl. Math. XXIII (1970), 353–675.
  • [4] E. Burman and P. Zunino, Numericak approximation of large contrast problems with the unfitted Nitsche method, Frontiers in Numerical Analysis-Durham 2010, J. Blowey and M. Jensen Eds., Springer-Verlag Berlin Heidelberg, 2012, pp. 227–281.
  • [5] W. E, J.Q. Han, and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat. 5 (2017), 349–380.
  • [6] W. E and B. Yu, The Deep Ritz Method: a deep-learning based numerical algorithm for solving variational problems, Commun. Math. Stat. 6 (2018), 1–12.
  • [7] Y. Goodfellow, I. Bengio and A. Courville, Deep Learning, MIT Press, Cambridge, 2016.
  • [8] M. Griebel and M.A. Schweitzer, A particle-partition of unit method part V: boundary conditions, Geometric Analysis and Nonlinear Partial Differential Equations, S. Hildebrandt eds., Springer-Verlag Berlin Heidelberg, 2003, pp. 519–542.
  • [9] J.Q Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. 115 (2018), no. 34, 8505–8510.
  • [10] K.M. He, X.Y. Zhang, S.Q. Ren, and J. Sun, Deep residual learning for image reconganition

    , In: IEEE Conference on Computer Vision and Pattern Recongnition (CVPR) (2016), 770–778.

  • [11] J. Khoo, J. Lu, and L. Ying, Solving parametric pde problems with artificial neural networks, 2017, arXiv: 1707.03351.
  • [12] D.P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980; Published as a conference paper at ICLR 2015.
  • [13] H.N. Mhaskar, When is approximation by Gaussian networks necessarily a linear process?, Neural Networks 17 (2004), 989–1001.
  • [14] J. Nitsche, Über ein Variationsprinzip zur Lösung von Drichlet-Problemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind, Abh. Math. Sem. Univ. Hamburg 36 (1971), 9–15.
  • [15] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys. 375 (2018), 1339–1364.
  • [16] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Inc., Englewood Cliffs, N. J., 1973.