## 1 Introduction

In this paper, we continue our previous studies on elliptic interface problems [guo2017gradient, guo2018gradient, GuYa2018], arising in many applications such as fluid dynamics and materials science, where the background consists of rather different materials on the subdomains separated by smooth curves called interfaces. We aim to address the high-dimensional challenge, which is well known as the curse of dimensionality leading to unaffordable computational time in traditional numerical methods (e.g., finite difference and finite element methods).

Deep neural networks have been shown as a powerful tool to overcome the curse of dimensionality [dudley1969speed, fournier2015rate, weed2019sharp, dos2018simulation]

, and have been applied to solve partial differential equations (PDEs),

e.g., the deep BSDE method [HJE2018, EHJ2017], the deep Galerkin method (DGM)[SiSp2018], the physics-informed neural networks (PINNs) [raissi2019physics], the deep Ritz method (DRM) [EYu2018], and the weak adversarial networks (WAN) [ZBYZ2020]. The deep BSDE reformulates the time-dependent equations into stochastic optimization problems. DGM and PINNs train neural networks by minimizing the mean squared error loss of the equation residual, while DRM trains networks by minimizing the energy functional of the variational problem equivalent to the PDEs. WAN uses the weak formulation and trains the primary and adversarial network alternatively using the min-max weak formulation. Moreover, the convergence of DRM was studied by [lu2021priori, duan2021convergence], and the deep Nitsche method was proposed in [LiMi2021], which enhanced the deep Ritz method with natural treatment of essential boundary conditions. In a recent work[ShYa2021], Sheng and Yang trained an additional neural network to impose the Dirichlet boundary conditions. However, these neural networks-based methods, in general, require the smoothness of the solutions to the PDEs, and thus can not be directly used to solve the elliptic interface problems, where the solutions are only piecewise smooth.In literature, there are some recent works of solving elliptic interface problems using neural networks. For example, [WaZh2020] proposed a network architecture similar to the deep Ritz method [EYu2018], and solved the equivalent variational problem with the boundary conditions approximated by a shallow neural network. [he2020mesh]

used different neural networks to approximate the solutions in disjoint subdomains. They reformulated the interface problem as a least-squares problem and solved it by stochastic gradient descent.

[hu2021discontinuity] proposed the discontinuity capturing shallow neural network (DCSNN) to approximate piecewise continuous functions and solved elliptic interface problems by minimizing the mean squared error loss consistent with the residual of the equation, boundary and interface jump conditions.In this paper, we propose a deep learning method for interface problems based on the minimization of the unfitted Nitsche’s energy, inspired by our previous studies

[GuYa2018, GYZ2021, GuYZ2021] on the unfitted Nitsche’s method. One of the most significant differences between the unfitted Nitsche’s method[BCHLM2015, HaHa2002, GYZ2021, GuYZ2021] and other methods for interface problems (e.g., the immersed finite element method [LiIt2006]) is that the unfitted Nitsche’s finite element functions can be discontinuous inside elements. This is possible by adopting two different sets of basis functions on the interface elements (i.e., the elements cut by the interface) which are weakly coupled using Nitsche’s methods. Based on the unfitted Nitsche’s formulation, we can define a so-call unfitted Nitsche energy functional (see equation (2.11) ). It turns out that the weak formulation of unfitted Nitsche’s method is just the Euler-Lagrange equation of unfitted Nitsche energy functional. To address the challenges of the curse of dimensionality, we naturally use deep neural networks to present functions in high dimensions. Following the idea of classical unfitted Nitsche’s method [BCHLM2015, HaHa2002, GYZ2021, GuYZ2021], we use two deep neural networks: one for the part inside the interface and the other one for the part outside the interface. These two parts are weakly connected using Nitsche’s method. The deep unfitted Nitsche method trains the two neural network functions independently using the same unfitted Nitsche energy functional.The rest of the paper is organized as follows. In Section 2, we introduce the model setup of the elliptic interface problems and its unfitted Nitsche’s weak form; In Section 3, we describe the formulation of deep unfitted Nitsche method with details; We present numerical examples in Section 4, and make conclusive remarks in Section 5.

## 2 Model equation

Denote by a bounded Lipschitz domain in , and assume there is a smooth closed curve separating into two parts: and . In general, can be described as a zero level set of some level set function . Then we have and .

In this paper, we shall consider the following elliptic interface problem equationparentequation

(2.1a) | ||||

(2.1b) | ||||

(2.1c) | ||||

(2.1d) |

where with

being the unit outward normal vector of

and the jump on is defined as(2.2) |

with being the restriction of on . The diffusion coefficient is a piecewise constant, i.e.,

(2.3) |

which has a finite jump of function value at the interface . An illustration of the domain is given in Figure 1.

To prepare the presentation of Nitsche’s weak formulation, we introduce two weights

(2.4) |

which satisfies that . Then, we define the weighted averaging of a function on the interface as

(2.5) |

and also its dual weighted averaging as

(2.6) |

Let be the function space consisting of piecewise Sobolev functions such that and , whose norm is defined as

(2.7) |

where is the -norm of a function in . Similar notations are used for piecewise space and its corresponding norm. In addition, let be the subspace of such that . In particular, is the subspace of with homogeneous Dirichlet boundary conditions.

The unfitted Nitsche’s weak formulation [HaHa2002, BCHLM2015, GuYa2018] of the interface problem (2.1a)-(2.1d) is to find such that

(2.8) |

where the bilinear form is defined as

(2.9) |

and the linear functional is defined as

(2.10) |

with the stability parameter .

To adopt the deep neural network, we reformulate the variational problem (2.8) as an energy minimization problem. For such a purpose, we define the unfitted Nitsche’s energy functional as

(2.11) |

Then, we can show the variational problem is equivalent to the following energy minimization problem:

(2.12) |

In other words, equation (2.1a)-(2.1d) is the Euler-Lagrangian equation of (2.12).

###### Remark

To simplify the presentation of this paper, we only consider inhomogeneous Dirichlet boundary conditions. For Neumann and Robin boundary conditions, they can be absorbed into the variation formulation. Other types of boundary conditions like mixed Dirichlet and Neumann boundary conditions can be handled similarly.

###### Remark

We choose to impose the Dirichlet boundary condition strongly by building it into the solution space. Alternatively, we can impose the Dirichlet boundary condition weakly as in [LiMi2021].

## 3 Deep unfitted Nitsche method

In this section, we use the universal approximation property of deep neural network and flexibility of unfitted Nitsche weak form to develop the deep unfitted Nitsche method for solving elliptic interface problem (2.1a)-(2.1d). In the first subsection, we briefly introduce the deep neural networks used in this paper. In the second subsection, we describe the details of the deep unfitted Nitsche formulation.

### 3.1 Deep neural network

One of the key ingredients for using deep learning to solve partial differential equations is to select a deep neural network as the ansatz function for trial functions. The commonly used deep neural networks to approximate the solutions to PDEs include feedforward neural network[LMMK, GoBC2016] and residual neural network (ResNet)[EYu2018, HZRS2016]. Similar to the deep Ritz method in [EYu2018], we take the ansatz function to be the residual neural network.

For each integer and some positive integer , let be the matrix of weights and be vector of biases for . Then, the th block of ResNet with neurons can be written as

(3.1) |

where

is the activation function

[GoBC2016, HiHi2019]. To avoid the problem of vanish gradient, smooth functions like sigmoid and hyperbolic tangent can be adopted.Similarly, let (or ) be the matrix of weights in the first (or last) layer and (or be the vector of biases in the first (or last) layer. The deep neural network with block can viewed as the composition of ’s

(3.2) |

where denote the sets of parameters, i.e.,

In Figure 2, we plot the architecture of the ResNet with 3 block.

### 3.2 Deep unfitted Nitsche formulation

The main advantage of the unfitted Nitsche’s method lies in the ability to use meshes independent of the location of the interface. This is possible by employing two different ansatz functions on the interface elements (elements cut through by the interface): one is for the interior domain and the other one is for the exterior domain. Those two different ansatz functions are discontinuous across the interface and patched together by Nitsche’s method. In the traditional unfitted Nitsche’s methods, the ansatz functions are piecewise polynomials. Following this line, we adopt two different deep neural network functions as the two ansatz functions to minimize the unfitted Nitsche energy (2.11).

Let be the ansatz function in () and denote . where . Before we proceed, we should make sure satisfies the Dirichlet boundary condition 2.1b. For such purpose, we introduce an additional boundary penalty term as

(3.3) |

Ideally, we expect close to zero. We define the extended unfitted Nitsche’s functional as

(3.4) |

where is the boundary penalty parameter.

Then, our deep unfitted Nitsche method is equivalent the following optimization problem

(3.5) |

###### Remark

In the literature, there are several alternative methods to impose the Dirichlet boundary conditions by building the Dirichlet boundary condition into the loss function

[LiMi2021] and training another deep neural network [ShYa2021]. For the sake of simplicity, we choose to impose the Dirichlet boundary condition for deep neural network functions using the penalty method as in [EYu2018]. Interesting readers are referred to [ChDW2020] for a comparison study of different boundary conditions handling methods.To solve the optimization problem using stochastic gradient descent type algorithms (e.g., SGD[GoBC2016] or ADMM[KiBa2015adam]), we approximate the integrals by using the Monte-Carlo method, where the number of integral point is independent of the dimensional of the underlying domain. Suppose are the uniformly sampled point in the domain for . Similarly, let and be the randomly sampled point on the interface and the domain boundary , respectively. The loss function is defined as

(3.6) |

where is the measure of in () and (or ) is the measure of (or ) in . Then, the discrete counterpart of the optimization problem (3.5) reads as

(3.7) |

The discrete optimization problem (3.7) actually involves the training of two deep neural network function and . Those two neural networks can be trained independently using the same loss function . The gradient of deep neural network function can be efficiently calculated using automatic differentiation functionality [pytorchad2019]

in the modern machine learning platform.

## 4 Numerical Experiments

In this section, we present several numerical examples to illustrate the performance of the proposed deep unfitted Nitsche method. The proposed algorithm is implemented using the machine learning platform Pytorch

[NEURIPS2019_9015]. In the following numerical experiments, we choose and have the same neural network architectures and select the activation function . Both deep neural networks are initialized by Xavier initialization to prevent from exploding or vanishing and are trained independently using ADAM[KiBa2015adam]. In all the following numerical experiments, we choose the learning rateand generate new mini-batches every 10 epochs. To provide a qualitative description of training results, we use the following relative

-error(4.1) |

The relative -error is computed using Monte-Carlo methods with uniform sampled points. The relative error are computed and recorded in every 100 epochs. Similarly, we record the loss in every 100 epochs.

### 4.1 Flower shape interface problem in 2D

In our first example, we consider the flower interface problem in the domain [GuYa2018, guo2017gradient]. The flower shape interface is given by the following polar coordinate

(4.2) |

The piecewise diffusion coefficient are and . We choose the right hand side function to fit the exact solution

The nonhomogeneous jump conditions (2.1d) and (2.1c) can be calculated from the above exact solution.

In this case, one can compute and

. To general uniformly distributed random points in

and , we firstly generate uniformly distributed random points in the whole domain . Then, we count the random points inside the interface as the randomly sampled points in and the rest random points are in . The randomly sampled points is produced by generating the uniformed sampled points on the interval and then are mapped onto the interface using (4.2).In this numerical test, we choose the ResNet with 3 blocks with for each ansatz function, as illustrated in the diagram 2. Each ansatz function has 701 parameters. We uniformly sample 1024 points in the domain . It turns out that there are points inside the interface . In other words, and . In addition, we just choose and . We choose and . The corresponding decay of errors and loss functions during the training process is plotted in Figure 4. We can see the error decays very fast at the initial several thousand epochs and then fluctuate. After 50000 epochs, the relative -error is reduced to . In Figure 3, we present a comparison between the deep unfitted Nitsche method(DUNM) solution and the exact solution. We can see the DUNM solution match well with the exact solution even though there is an inhomogeneous jump in the function values.

### 4.2 High-contrast interface problem in 2D

In this example, we consider the high contrast interface problem with homogeneous jump conditions in the domain [GuYa2018, guo2018gradient, guo2017gradient]. The interface is the circle of radius centered at the original. The exact solution is

where .

In this example, we consider the radius circle with . We also use the ResNet with 3 blocks with output in each block to represent each function. The uniformly random sampled points are generated by the same method as in previous example. Again, we generate random sampled points in . In that case, points are inside and points are outside . Similarly, and . The penalty parameters and are the exactly the same as previous case.

Error() | 2.29 | 0.62 | 2.36 | 0.63 |
---|

In the following numerical tests, we focus on the training of high contrast interface problem by considering the following four typical different jump ratios: (large jump), (large jump), (huge jump), and (huge jump). The training processes are described in Figures 5-7 for each cases, respectively. In these figures, we notice that the errors quickly decay to some specific errors and then oscillate around them. The relative errors after 200000 epochs are summarized in the Table 1, where one can observe that the errors are not influenced by the jump ratios.

### 4.3 High-dimensional interface problems

In this example, we consider the -dimensional interface problem in the unit cube . The unit cube is divided into two parts and by the d-dimensional sphere with radius . The diffusion coefficients are and . The exact solution is

where . Therefore, we have homogeneous jump conditions. The right hand side function and boundary condition can be obtained from .

Error() | 3.90 | 2.14 | 4.43 | 7.04 |
---|

In this case, the measure of and the measure of . The measure of the interface is .
We approximate each component of the solution by ResNet with 3 blocks of 20 neurons. In total, there are 2621 parameters for each deep neural network. To generate uniformly random points on dimensional spheres, we use the `sample_hypersphere`

function in BoTorch [BKJD2020]. To guarantee there are enough random sampled point in . We first generate (about ) uniformly random point inside the -dimensional ball with using the dropped coordinates method [HaLa2010]. In specific, we firstly generate points on sphere with radius using the `sample_hypersphere`

function in BoTorch [BKJD2020] and drop the last two coordinates to get the uniform random sampled points. Then, we generate random points in which may also contain points in . So we have and . We take and . In the following test, we take and .

We consider four different high dimensional cases: . Figure 11 displays the decay of the relative error during the training process. Similar to the previous two examples, we can see that the errors decay quickly at the first few epochs and then fluctuate around some specific levels. In Table 2, we reports the relative error after epochs. We can see that we get almost the same level of relative errors independent of the dimensionality of the space even though we use the same number of points.

## 5 Conclusion

As a continuous study of our previous works on elliptic interface problems [guo2017gradient, guo2018gradient, GuYa2018], we propose a deep unfitted Nitsche method to address the high-dimensional challenge with high-contrasts. The challenge is also known as the curse of dimensionality. The classical numerical methods like finite difference and finite methods require unaffordable computational time. The proposed method deploys the deep neural network to solve the equivalent high-dimensional optimization problem. To address the high contrasts of the solution, we introduced a so-called unfitted Nitsche energy functional, which utilize different deep neural networks to present different components of the solution in the high dimensional case. Different deep networks are patched together weakly by Nitsche’s method and can be trained independently using the unfitted Nitsche functional. The unfitted Nitsche’s energy function is approximated by the Monte-Carlo methods. An additional penalty term is added to the discrete energy functional to handle the Dirichlet boundary conditions. The proposed method is easy to be implemented and mesh-free, which is illustrated by several numerical examples including high contrasts and high dimensional cases.

## Acknowledgment

H.G. is grateful to Prof. Zuoqiang Shi from Tsinghua University and Dr Guozhi Dong from the Humboldt University of Berlin for stimulating and helpful discussions. H.G. was partially supported by Andrew Sisson Fund of the University of Melbourne, X.Y. was partially supported by the NSF grant DMS-1818592.