Multi-scale Deep Neural Network (MscaleDNN) for Solving Poisson-Boltzmann Equation in Complex Domains

07/22/2020 ∙ by Ziqi Liu, et al. ∙ Shanghai Jiao Tong University Southern Methodist University 0

In this paper, we propose novel multi-scale DNNs (MscaleDNN) using the idea of radial scaling in frequency domain and activation functions with compact support. The radial scaling converts the problem of approximation of high frequency contents of PDEs' solutions to one of lower frequency, and the compact support activation functions facilitate the separation of scales to be approximated by corresponding DNNs. As a result, the MscaleDNNs achieve fast uniform convergence over multiple scales. The proposed MscaleDNNs are shown to be superior to traditional fully connected DNNs and be an effective mesh-less numerical method for Poisson-Boltzmann equations with ample frequency contents over complex and singular domains.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural network (DNN) has found many applications beyond its traditional applications such as image classification and speech recognition into the arena of scientific computing [9, 10, 12, 13, 14, 16, 20, 11, 21, 22]. However, to apply the commonly-used DNNs to these computational science and engineering problems, we are faced with several challenges. The most prominent issue is that the DNN normally only handles data with low frequency content well, which has been shown by a Frequency Principle (F-Principle) that many DNNs learn the low frequency content of the data quickly with good generalization error, but they will be inadequate when high frequency data are involved [24, 19, 23]. The fast convergence behavior of low frequency has been recently studied rigorously in theory in [18, 26, 2, 6]. As a comparison, such a behavior of DNN is just the opposite of that of the popular multi-grid methods (MGM) for solving numerically PDEs, such as the Poisson-Boltzmann (PB) equation, where the convergence is achieved first in the high frequency spectrum of the solution due to the smoothing operations in the MGM. Due to the potential of DNNs in handling higher dimensional solutions and approximating functions without the need of a structured mesh as in traditional finite element or finite difference method, it is of great value to extend the capability of DNN as a mesh-less PDE solver. Therefore, it is imperative to improve the convergence of DNNs for fine structures in the solution as encountered in the electrostatic potentials of complex molecules.

The electrostatic interaction of the bio-molecules with ionic solvents, governed by the Poisson-Boltzmann (PB) equation within the Debye-Huckel theory [5], plays an important role in many applications including drug design and the study of disease. However, due to the complex surface structure of the bio-molecules, usually represented by a bead model, it has been a long outstanding challenging to design efficient numerical method to handle the singular molecular surface, which is either the van der Waals (vdW) surface being the sum of overlapping vdw spheres or the solvent accessible surface (SAS) generated by rolling a small ball on the vdW surface [17], and the complex distribution of the electrostatic potential over the molecular surfaces. Tradition finite element [1] and finite difference methods [25] have faced difficulties in the costly mesh generation and expensive solution of the discretized linear system. Therefore, in this paper, we will propose and investigate multi-scale DNNs, termed MscaleDNN, with the goal of approximating both low and high frequency information of a function uniformly and developing a mesh-less solver for PDEs such as the PB equations in domains with complex and singular geometries.

The main idea of the MscaleDNN is to find a way to convert the learning or approximation of high frequency data to that of a low frequency one. Similar idea has been attempted in a previous work in the development of a phase shift DNN (PhaseDNN) [3], where the high frequency component of the data was given a phase shift downward to a low frequency spectrum. The learning of the shifted data can be achieved with a small sized DNN quickly, which was then shifted (upward) to give approximation to the original high frequency data. The PhaseDNN has been shown to be very effective to handle highly oscillatory data from solution of high frequency Helmholtz equations and functions of small dimensions. However, due to the number of phase shifts employed along each coordinate direction independently, the PhaseDNN will result in many small DNNs and a considerable computational cost even for three dimensional problems. In this paper, we will consider a different approach to achieve the conversion of high frequency to lower one, namely, with a radial partition of the Fourier space, a scaling down operation will be used to convert higher frequency spectrum to a low frequency one before the learning is carried out with a small-sized DNN. As the scaling operation only needs to be done along the radial direction in the Fourier space, this approach is easy to be implemented and gives an overall small number of DNNs, thus reducing the computational cost. In addition, borrowing the multi-resolution concept of wavelet approximation theory using compact mother scaling and wavelet functions, we will modify the traditional global activation functions to ones with compact support. The compact support of the activation functions with sufficient smoothness will give a localization in the frequency domain where scaling operation will effectively produce DNNs to approximate different frequency contents of a PDE solution.

Two types of MscaleDNN architectures are proposed, investigated and compared for their performances. After various experiments, we demonstrate that MscaleDNNs solves elliptic PDEs much faster and can achieve a much smaller generalization error, compared with normal fully connected networks with similar size. We will apply MscaleDNNs to solve variable coefficient elliptic equations, including those solutions with a broad range of frequencies and over different types of domains such as a ring-shaped domain and a cubic domain with multiple holes. Also, to test the potential of MscaleDNN for finding Poisson-Boltzmann electrostatic solvation energy in bio-molecules, we apply MscaleDNN to solve elliptic equation with geometric singularities, such as cusps and self-intersecting surfaces in a molecular surface. These extensive experiments clearly demonstrate that the MscaleDNN is an efficient and easy-to-implement mesh-less PDE solver in complex domains.

The rest of the paper is organized as follows. In section 2, we will introduce frequency scaling to generate a MscaleDNN representation. Section 3 will present MscaleDNN structures with compact support activation functions. Section 4 will present a minimization approach through the Ritz energy for finding the solution of elliptic PDEs and a minimization approach through a least squared error for fitting functions. In section 5, we use two test problems to show the effectiveness of the proposed MscaleDNN over a normal fully connected DNN of same size. Next, numerical results of the solution of complex elliptic PDEs with complex domains by the proposed MscaleDNN will be given in Section 6. Finally, Section 7 gives a conclusion and some discussion for further work.

2 Frequency scaled DNN and compact activation function

In this section, we will first present a naive idea of how to use a frequency scaling in Fourier wave number space to reduce a high frequency learning problems for a function to a low frequency learning one for the DNN and will also point out the difficulties it may encounter as a practical algorithm.

Consider a band-limited function

, whose Fourier transform

has a compact support, i.e.,

(1)

We will first partition the domain as union of concentric annulus with uniform or non-uniform width, e.g.,

(2)

so that

(3)

Now, we can decompose the function as follows

(4)

where

(5)

The decomposition in the Fourier space give a corresponding one in the physical space

(6)

where

(7)

and the inverse Fourier transform of is called the frequency selection kernel [3] and can be computed analytically using Bessel functions

(8)

From (5), we can apply a simple down-scaling to convert the high frequency region to a low frequency region. Namely, we define a scaled version of as

(9)

and

(10)

or

(11)

noting the low frequency spectrum of the scaled function if is chosen large enough, i.e.,

(12)

Using the F-Principle of common DNNs [23], with being small, we can train a DNN to learn quickly

(13)

which gives an approximation to immediately

(14)

and to as well

(15)

The difficulty of the above procedure for approximating function and even more for finding a PDE solution is the need to compute the convolution in (7), which is computationally expensive for scattered data in the space, especially in higher dimensional problems.

3 MscaleDNN structure

3.1 Activation function with compact support

In order to produce scale separation and identification capability of a MscaleDNN, we borrow the idea of compact mother scaling function in the wavelet theory [8], and consider the activation functions with compact support as well. Compared with the normal activation function , we found activation functions with compact support are more effective in MscaleDNNs. Two possible activation functions are defined as follows

(16)

and the quadratic B-spline with first continuous derivative

(17)

where . All three activation functions are illustrated in spatial domain in Fig. 1 and the Fourier transforms of both and are illustrated in Fig. 2.

(a) ReLU
(b) sReLU
(c)
Figure 1: Activation functions in spatial domain.
(a) sReLU
(b)
Figure 2: Activation functions in frequency domain, normalized by the maximum of each case.
(a) MscaleDNN-1
(b) MscaleDNN-2
Figure 3: Illustration of two MscaleDNN structures.

3.2 MscaleDNN structure

While the procedure leading to (15) is not practical for numerical approximation in high dimension, it does suggest a plausible form of function space for finding the solution more quickly with DNN functions. We can use a series of ranging from to a large number to produce a MscaleDNN structure to achieve our goal in speeding up the convergence for solution with a wide range of frequencies with uniform accuracy in frequencies. For this purpose, we propose the following two multi-scale structures.

MscaleDNN-1

For the first kind, we separate the neuron in the first hidden-layer into to

parts. The neuron in the -th part receives input , that is, its output is , where , , b are weight, input, and bias parameters, respectively. A complete MscaleDNNs takes the following form

(18)

where , , is the neuron number of -th hidden layer, , , is a scalar function and “” means entry-wise operation, is the Hadamard product and

(19)

where or .

We refer to this structure as Multi-scale DNN-1 (MscaleDNN-1) of the form in Eq. (18), as depicted in Fig. 3(a).

MscaleDNN-2 A second kind of multi-scale DNN is given in Fig. 3(b), as a sum of subnetworks, in which each scale input goes through a subnetwork. In MscaleDNN-2, weight matrices from to are block diagonal. Again, we could select the scale coefficient or .

For comparison studies, we will define a “normal” network as an one fully connected DNN with no multi-scale features. We would perform extensive numerical experiments to examine the effectiveness of different settings and use an efficient one to solve complex problems. All models are trained by Adam [15] with learning rate .

4 MscaleDNN for approximation and elliptic PDE’s solution

In this section, we will address two problems, i.e., solving PDEs such as the PB equations, and fitting functions, to show the effectiveness of MscaleDNNs in the following sections.

4.1 Mean squared error training for fitting functions

A DNN, denoted by , will be trained with the mean squared error (MSE) loss to fit a target function

. The loss function is defined as

(20)

where is a neural network with parameter .

In our training process, the training data are sampled from

at each training epoch, the loss at each epoch is

(21)

where is the sample size in .

The above training process requires all information of the target function, which indicates such training process is not of much practical use. We conduct this study to examine the ability of a DNN in fitting high-frequency functions given sufficient information of the target function.

4.2 A Ritz variational method for Poisson-Boltzmann equations

Let us consider the following elliptic Poisson-Boltzmann equation,

(22)

where is the dielectric constant and the inverse Debye-Huckel length of an ionic solvent. For a typical solvation problem of a solute such as a biomolecule in ionic solvent, the dielectric constant will be a discontinuous function across the solute-solvent interface and the following transmission condition will be included,

(23)
(24)

where denotes the jump of the quantity inside the square bracket.

We will apply the deep Ritz method as proposed in [10], which produces a variational solution of equation (22) and (23) (24) through the following minimization problem

(25)

where the energy functional is defined as

(26)

We use the MscaleDNN to represent trial functions in the above variational problem, where is the DNN parameter set. Then, the MscaleDNN solution is

(27)

The minimizer can be found by a stochastic gradient decent (SGD) method,

(28)

The integral in Eq. (26) will only be sampled at some random points at each training step (see (2.11) in [10]), namely,

(29)

At convergence , we obtain a MscaleDNN solution .

In our numerical tests, the Ritz loss function is modified to account for boundary conditions

(30)

where is the DNN output, is the sample set from and is the sample size, indicates the number of sample set from . The second penalty term is to enforce the boundary condition. We choose for all experiments.

To see the learning accuracy, we also compute the error between and on test data points inside the domain,

(31)

5 Effectiveness of various MscaleDNN settings

In this section, we will show that MscaleDNNs outperform normal fully-connected DNNs (indicated by “normal" in the numerical results) in various settings, that is, the loss function of MscaleDNN decays faster to smaller values than that of normal fully-connected DNNs. First, we will carry out two test problems. Second, we will demonstrate that compact supported activation functions of and are much better than the common used . Third, we use activation functions to show MscaleDNN structures are better than normal fully connected one. Finally, we examine the difference of different scale selections.

5.1 Two test problems

To understand the performance of different MscaleDNNs and their parameters, here we consider one- and two- dimensional problems in fitting functions and solving PDEs, and problems in 3-D in complex domains will be considered in the next section.

Test problem 1: Fitting problem

The target function for the fitting problem is

(32)

where ,

In the case of , we choose while for the case of , . The functions of and are shown in Fig. 4. training data at each epoch and test data are randomly sampled from . All DNNs are trained by Adam optimizer with learning rate .

(a)
(b)
Figure 4: Test Problem 1: target functions for fitting problems.

Test problem 2: Solving PB equations

We will solve the elliptic equation (22) with and a constant in a domain and the right hand side

which gives a PB equation with the following exact solution,

with corresponding boundary condition given by the exact solution.

For , we choose , . For , we choose , . The exact solutions for and are shown in Fig. 5. DNNs are trained by Adam optimizer with learning rate . training data at each epoch are randomly sampled from . We choose the penalty coefficient for boundary as . The number of boundary data randomly sampled from is for and for .

(a)
(b)
Figure 5: Test problem 2: exact solutions of 1-D and 2-D PB equation.

5.2 Different activation functions

We use the following three network structures to examine the effectiveness of different activation functions by solving one-dimensional fitting and PDE problems described above:

  1. fully-connected DNN with size 1-900-900-900-1 (normal).

  2. MscaleDNN-1 with size 1-900-900-900-1 and scale coefficients of (MscaleDNN-1(32)).

  3. MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients of . (MscaleDNN-2(32)).

We use three different activation functions, i.e., , , for the above structures. For normal network structure in the fitting problem, as shown in Fig. 6(a), (blue) performs much better than other two activation functions, however, for normal network structure in the solving PDE, as shown in Fig. 7(a), (blue) performs much worse than other two activation functions. These indicate all three activation function are not stable in normal fully connected structure. On the other hand, as shown in Fig. 6 (b, c), and 7 (b, c), for both MscaleDNN structures, the performance of compact supported activation functions, (orange) and (blue), are both much better than that of (green) for both test problems.

(a) normal
(b) MscaleDNN-1
(c) MscaleDNN-2
Figure 6: Different activation functions in 1-D fitting problems.
(a) normal
(b) MscaleDNN-1
(c) MscaleDNN-2
Figure 7: Different activation functions in a 1-D PB equation.

5.3 Different network structures

In this subsection, we examine the effectiveness of the following different network structures with the activation function of :

  1. fully-connected DNN with size 1-900-900-900-1 (normal).

  2. MscaleDNN-1 with size 1-900-900-900-1 and scale coefficients of (MscaleDNN-1(32)).

  3. MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients of . (MscaleDNN-2(32)).

As shown in Fig. 8 and 9, both MscaleDNN structures are better than normal structures in both problems. Two different MscaleDNN structures have similar performance in both test problems. As MscaleDNN-2 performs better than MscaleDNN-1 and also has much less connections compared with MscaleDNN-1, in the following we use MscaleDNN-2 for further numerical experiments.

(a) 1d function
(b) 2d function
Figure 8: Different network structures in fitting problems.
(a) 1d equation
(b) 2d equation
Figure 9: Different network structures in PDE problems.

5.4 Different scale selections in MscaleDNNs

In this subsection, we will test different scales for the activation function in MscaleDNNs:

  1. fully-connected DNN with size 1-900-900-900-1 (normal).

  2. MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients of (MscaleDNN-2(1)).

  3. MscaleDNN-2 with three subnetworks with size 1-300-300-300-1 and scale coefficients of . (MscaleDNN-2(3)).

  4. MscaleDNN-2 with three subnetworks with size 1-300-300-300-1 and scale coefficients of . (MscaleDNN-2(4)).

  5. MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients of (MscaleDNN-2(6)).

  6. MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients of (MscaleDNN-2(32)).

As shown in Fig. 10, MscaleDNNs almost perform consistently better than normal DNNs. Note that with larger-range scale, MscaleDNN solves the problem faster. With all scales as , the performance of DNN structure (MscaleDNN-2(1)) is much worse than those with multiscales in solving elliptic PDEs. Therefore, with the subnetwork structures with different scales, the MscaleDNN is able to achieve a faster convergence. These experiments show that MscaleDNNs with proper scales are more efficient in solving PDE problems and the selection of the scales are not too sensitive.

(a) 1d equation
(b) 2d equation
Figure 10: Different scale options in 1-D and 2-D PB equations.

With these numerical experiments, we have demonstrated that MscaleDNN is much more efficient to solve elliptic PDEs and the preferred network is MscaleDNN-2 with the compact support function , which will be used for the rest of the paper for solving Poisson and PB equations in complex and/or singular domains.

6 MscaleDNNs for Poisson and Poisson-Boltzmann equations in complex domains

In this section, we apply MscaleDNNs with activation function to solve complex elliptic equations, including cases with a broad range of frequencies, variable coefficients, a ring-shaped domain, and a cubic domain with multiple holes. Finally, we apply the MscaleDNN to solve PB equations with geometric singularities, such as cusps and self-intersecting surfaces, which comes from a typical bead-model of bio-molecule. Through these experiments, we convincingly demonstrate that MscaleDNNs are an efficient and easy-implemented mesh-less method to solve complex elliptic PDEs.

6.1 Broad range of frequencies

Consider the Poisson equation in ,

(33)

where

The equation has an exact solution as

which will also provide the boundary condition in problem (33).

In each training epoch, we sample points inside the domain and points from the boundary. We examine the following two structures:

  1. a fully-connected DNN with size 1-1000-1000-1000-1 (normal).

  2. a MscaleDNN-2 with five subnetworks with size 1-200-200-200-1, and scale coefficients . (Mscale).

This problem does not have a fixed frequency but a broad range of frequencies. A commonly-used fully connected DNN will not be able to solve this problem. For , the exact solution for the two-dimensional case of problem (33) is shown in Fig. 11 (a) as a highly oscillated function. The solution, obtained by the normal DNN in Fig. 11 (b), fails to capture the oscillate structure, while the solution obtained by the MscaleDNN in Fig. 11 (c) captures well the different-scale oscillations. For example of the area marked by the red circle, the expected oscillation almost disappears in the solution of the normal networks while MscaleDNN solutions resolve the oscillations well. Similar behavior differences occur for the oscillations at four corners.

(a) exact
(b) normal
(c) Mscale
Figure 11: Two-dimensional case for problem (33). As an example, the MscaleDNN well captures the oscillation in the red marked circle while the normal fully connected network fails.

The errors of the two-dimensional and the three-dimensional problems are shown in Fig. 12 (a) and (b), respectively. In both cases, MscaleDNNs solve problems much faster to lower errors.

(a) 2d Poisson equation
(b) 3d Poisson equation
Figure 12: Error vs. epoch for problems with broad range of frequencies.
Figure 13: Error vs. epoch for variable coefficient PB equation.

6.2 PB equations with Variable coefficients

Consider the PB equation (22) in with

and

which has an exact solution as

The boundary condition is also given by the exact solution . We choose .

In each training epoch, we sample points inside the domain and points from the boundary. We compare the following two DNN structures:

  1. a fully-connected DNN with size 1-900-900-900-1 (normal).

  2. a MscaleDNN-2 with six subnetworks with size 1-150-150-150-1 and scale coefficients . (Mscale).

As shown in Fig. 13, during the training process, the error of the MscaleDNN decays significantly, while the error of the normal DNN almost keeps unchanged. Therefore, MscaleDNN solves the problem much faster with a much better accuracy.

6.3 A ring-shaped domain

Consider the Poisson equation (33) in a ring-shaped domain with its center at and inner radius and outer radius with a source term

(34)

where is the Bessel function. The exact solution is given by

(35)

Again, the boundary condition is again given by the exact solution . We choose and solve the equation with and .

In each training epoch, we sample points inside the domain and points from the boundary. We examine the following two structures:

  1. a fully-connected DNN with size 1-500-500-500-1 (normal).

  2. a MscaleDNN-2 with five subnetworks with size 1-100-100-100-1 and scale coefficients . (Mscale).

The exact solutions and numerical solutions obtained by normal and MscaleDNNs are shown in Fig. 14 () and Fig. 15 (). To highlight the superior performance of the MscaleDNNs, areas in the figures marked by the black circle show the region of the solution with the largest amplitude, the normal networks completely fail to capture the oscillations while the MscaleDNNs faithfully captures them in both cases. Again, as shown in Fig. 16, MscaleDNN solves both problems with a much better accuracy.

(a) exact
(b) normal
(c) Mscale
Figure 14: Exact and numerical solutions for the equation in a ring-shaped domain with . The black circle is for illustration purpose only.
(a) exact
(b) normal
(c) Mscale
Figure 15: Exact and numerical solution for the equation in ring-shaped domain with . The black circle is for illustration purpose only.
(a)
(b)
Figure 16: Error vs. epoch for the Poisson equation in ring-shaped domain.

6.4 A square domain with a few holes

We consider next the following two square domains with three and four holes, respectively:

Domain one

The centers for three circle holes are , , and , with radiuses of , , and , respectively. In each epoch, we randomly sample on outer boundary, points on the boundary of each big hole and points on the boundary of the small hole.

Domain two

The centers for three circle holes are , and , with radiuses of , , , respectively. The boundary of the elliptic hole is described by . The sample sizes at each epoch are , , , and for the outer boundary, the boundary of the big circle hole, the boundary of each small circle hole, and the boundary of the elliptic hole, respectively.

We solve the Poisson equation (33) with the source term as

(36)

The exact solution is

(37)

which also provides the boundary condition.

In each training epoch, we sample points inside the domain. We examine the following two DNN structures:

  1. a fully-connected DNN with size 1-1000-1000-1000-1 (normal).

  2. a MscaleDNN-2 with five subnetworks with size 1-200-200-200-1, and scale coefficients of . (Mscale).

Compared with the exact solutions in Fig. 17 (a) and Fig. 18 (a), normal DNN fails to resolve the magnitidues of many oscillations as shown in Fig. 17 (b) and Fig. 18 (b) while MscaleDNNs capture each oscillation of the true solutions accurately as shown in Fig. 17 (c) and Fig. 18 (c).

(a) exact
(b) normal
(c) Mscale
Figure 17: Exact and numerical solution for the Poisson equation in domain 1.
(a) exact
(b) normal
(c) Mscale
Figure 18: Exact and numerical solution for the Poisson equation in domain 2.

As shown in Fig. 19. MscaleDNNs solve both problems much faster to lower errors.

(a) domain 1
(b) domain 2
Figure 19: Error vs. epoch for the Poisson equation in square domains with few holes.

6.5 A square domain with many holes

To verify the capability of the MscaleDNN for complex domains, we consider a three dimensional cube with 125 holes inside removed, and the holes are centered at a uniform mesh, i.e.,

, with radiuses randomly sampled from a uniform distribution in

. The sample sizes for training DNNs at each training epoch are for the outer boundary and for the inner holes ( points for each hole).

Again, consider the Poisson equation with and the Dirichlet boundary condition given by the exact solution for the following three cases:

  1. Example 1: .

  2. Example 2: .

  3. Example 3: .

The difficulty of this problem consists of the complex holes and oscillatory exact solutions with .

To visualize the complexity of the problem, we show holes of domain in Fig. 20.

Figure 20: Holes of domain for the problem

In each training epoch, we sample points inside the domain, and compare the following two structures:

  1. a fully-connected DNN with size 1-1000-1000-1000-1 (normal).

  2. a MscaleDNN-2 with five subnetworks with size 1-200-200-200-1, and scale coefficients of . (Mscale).

As shown in Fig. 21 for all three cases, the normal fully-connected structures do not converge for such complex problems at all while MscaleDNNs can solve the problem with much smaller errors.

(a) Example 1
(b) Example 2
(c) Example 3
Figure 21: Error vs. epoch for the PDEs in square domain with many holes.

6.6 Geometric singularities

In this subsection, we consider the PB equation (22) in a domain with geometric singularities and jump condition on interior interfaces, which arises from the simulation of solvation of biomolecules. Consider an open bounded domain , which divides into two disjoint open subdomains by the surface . is identified as the biomolecular domain, and is called the solvent domain. The exact solution is also divided into two parts, is defined in and is defined in . The solution will also satisfy the transmission condition (23) (24) along the interface and a decaying condition at the , i.e.

(38)

To deal with the unbounded domain, we truncate the solution domain to a large ball or cube, denoted by satisfying and and an approximate condition will be posed on the boundary of the ball (Fig. 22) and such a crude boundary condition will surely introduce error to the PDEs solution and higher order boundary conditions have been studied extensively. As we are more interested in the performance of the DNNs near interior interface, we will not ponder over this issue here.

(a) before truncation
(b) after truncation
Figure 22: Spherical truncation of the physical domain.

The domain with geometric singularities is constructed as follows. We choose a big ball with center and radius . points are randomly selected on the surface of the big ball as the centers of small balls. Radiuses of the small balls are randomly sampled from . is the union of these balls and the big ball. The shape of is illustrated in Fig. 23. The intersections among balls cause geometric singularities, such as kinks, which poses major challenges for obtaining mesh generation for traditional finite element and boundary element methods and accurate solution procedures.

Figure 23: Illustration of the domain with geometric singularities.

Following two examples are considered. In both examples, coefficients and are chosen as piece-wise constant. We do not consider singular sources for the PB equations, which can occur from the point charges inside bio-molecules or ions in the solvents. These point charge sources, modeled by Dirac delta function, will create point singularity in the solution, which can be easily removed by subtracting a singular solution [7] and the remaining smooth part can be then solved by the MscaleDNN as follows.

Example 1

The exact solution is

(39)

with coefficients for the PB equation as

(40)

The whole domain is truncated by a ball with center at and a radius with zero boundary condition on the sphere.

Example 2

We choose

(41)

with coefficients

(42)

In this case, the computational domain is obtained with a truncation by a cube and the reference solution is calculated by finite difference method (FDM) with a sufficient fine mesh ensuring enough accuracy.

In example 1, in each training epoch, we sample points inside the domain and points on boundary . In example 2, we sample points inside the domain , points on boundary . We train MscaleDNNs with the Ritz loss function in (30). Note that the continuous condition in (23) is automatically satisfied since we use a single network to fit the whole domain ; The natural condition in (24) is also automatically satisfied due to the Ritz loss.

We examine the following two structures:

  1. a fully-connected DNN with size 1-1000-1000-1000-1 (normal).

  2. a MscaleDNN-2 with five subnetworks with size 1-200-200-200-1, and scale coefficients of . (Mscale).

Since the value of the exact solution is small, we show the relative error for both cases. As in practice, the exact solution is unknown, therefore, we also show the training loss for both examples, which could be used as a possible criteria to terminate the training. For example 1 as shown in Fig. 24, the training loss in Fig. 24(a) and the error in Fig. 24(b) have similar trends, that is, the training loss and the error of the MscaleDNN converge faster to smaller values, compared with the normal DNN. For example 2 shown in Fig. 25, the MscaleDNN shows a similar advantage over the normal DNN. These examples indicate that with by just monitoring the training loss, MscaleDNN solves the PB equations with non-smooth solution over singular domains much faster and with better accuracy.

(a) loss
(b) relative error
Figure 24: Loss and relative error vs. epoch for the PB equation in a domain with geometric singularities. (Example 1)
(a) loss
(b) relative error
Figure 25: Loss and relative error vs. epoch for the PB equations in a domain with geometric singularities. (Example 2)
Figure 26: Numerical solutions of Example 2 on line .

For illustration, we show a cross section of the solution in the second example. The reference solution is obtained by the FDM. Numerical solutions on the line obtained by FDM(), normal DNN( epochs) and MscaleDNN( epochs) are shown in Fig. 26. The output of the normal fully connected network gives a wrong solution in the interior of the singular domain while the MscaleDNN gives satisfactory approximation to the reference solution.

7 Conclusion and future work

In this paper, we have introduced a new kind of multiscale DNN, MscaleDNN, using a frequency domain scaling technique and compactly supported activation functions, to generate a multi-scale capability for finding the solutions of elliptic PDEs with rich frequency contents. By using a radial scaling in the Fourier domain of the solutions, the MscaleDNN is shown to be an efficient meshless and easy-to-implement method for PDEs on complex and singular domains.

For future work, we will also explore the idea of activation function with the wavelet mother wavelet properties as proposed in our previous version [4], which should give further frequency localization and separation capability in the MscaleDNNs. More importantly, an area to be explored is to apply the MscaleDNN to high dimensional PDEs such as Schrodinger equations for many body quantum systems, issues of high dimensional sampling and low dimensional structure of solutions will be studied.

Acknowledgments

ZX is supported by by National Key R&D Program of China (2019YFA0709503), and Shanghai Sailing Program.

References

  • [1] N. Baker, J. S. Sept D, H. MJ, and M. JA. (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proceedings of the National Academy of Sciences 98 (18), pp. 10037–41. Cited by: §1.
  • [2] R. Basri, D. Jacobs, Y. Kasten, and S. Kritchman (2019) The convergence rate of neural networks for learned functions of different frequencies. arXiv preprint arXiv:1906.00425. Cited by: §1.
  • [3] W. Cai, X. Li, and L. Liu (2019) A phase shift deep neural network for high frequency approximation and wave problems. Accepted by SISC, arXiv:1909.11759. Cited by: §1, §2.
  • [4] W. Cai and Z. J. Xu (2019) Multi-scale deep neural networks for solving high dimensional pdes. Arxiv preprint, arXiv:1910.11710. Cited by: §7.
  • [5] W. Cai (2013) Computational methods for electromagnetic phenomena, electrostatics in solvation, scatterings, and electron transport. Cambirdge University Press. Cited by: §1.
  • [6] Y. Cao, Z. Fang, Y. Wu, D. Zhou, and Q. Gu (2020)

    Towards Understanding the Spectral Bias of Deep Learning

    .
    arXiv:1912.01198 [cs, stat]. Cited by: §1.
  • [7] I.-L. Chern, J.-G. Liu, and W.-C. Wang (2003) Accurate evaluation of electrostatics for macromolecules in solution. Meth. Appl. Anal. 10 (), pp. 309–328. Cited by: §6.6.
  • [8] I. Daubechies (1992) Ten lectures on wavelets. Vol. 61, Siam. Cited by: §3.1.
  • [9] W. E, J. Han, and A. Jentzen (2017)

    Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

    .
    Communications in Mathematics and Statistics 5 (4), pp. 349–380. Cited by: §1.
  • [10] W. E and B. Yu (2018) The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §1, §4.2, §4.2.
  • [11] A. Hamilton, T. Tran, M. Mckay, B. Quiring, and P. Vassilevski (2019) DNN approximation of nonlinear finite element equations. Technical report Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States). Cited by: §1.
  • [12] J. Han, A. Jentzen, and E. Weinan (2018) Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. Cited by: §1.
  • [13] J. Han, L. Zhang, R. Car, et al. (2018) Deep potential: a general representation of a many-body potential energy surface. Communications in Computational Physics 23 (3). Cited by: §1.
  • [14] J. He, L. Li, J. Xu, and C. Zheng (2018) Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973. Cited by: §1.
  • [15] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
  • [16] Y. Liao and P. Ming (2019) Deep nitsche method: deep ritz method with essential boundary conditions. arXiv preprint arXiv:1912.01309. Cited by: §1.
  • [17] S. Lindskog (1997) Structure and mechanism of carbonic anhydrase. Pharmacol. Therapeut. 74 (), pp. 1–20. Cited by: §1.
  • [18] T. Luo, Z. Ma, Z. J. Xu, and Y. Zhang (2019) Theory of the frequency principle for general deep neural networks. arXiv preprint arXiv:1906.09235. Cited by: §1.
  • [19] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville (2019) On the Spectral Bias of Neural Networks. In

    International Conference on Machine Learning

    ,
    pp. 5301–5310. Cited by: §1.
  • [20] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
  • [21] C. M. Strofer, J. Wu, H. Xiao, and E. Paterson (2019)

    Data-driven, physics-based feature extraction from fluid flow fields using convolutional neural networks

    .
    Communications in Computational Physics 25 (3), pp. 625–650. Cited by: §1.
  • [22] Z. Wang and Z. Zhang (2020) A mesh-free method for interface problems using the deep learning approach. Journal of Computational Physics 400, pp. 108963. Cited by: §1.
  • [23] Z. J. Xu, Y. Zhang, T. Luo, Y. Xiao, and Z. Ma (2019) Frequency principle: fourier analysis sheds light on deep neural networks. Accepted by Communications in Computational Physics, arXiv:1901.06523. Cited by: §1, §2.
  • [24] Z. J. Xu, Y. Zhang, and Y. Xiao (2019) Training Behavior of Deep Neural Network in Frequency Domain. In Neural Information Processing, Lecture Notes in Computer Science, pp. 264–274. External Links: Document Cited by: §1.
  • [25] S. Geng,W. Yu and G.W. Wei (2007) Treatment of geometric singularities in implicit solvent models. J. Chem. Phys. 126 (), pp. 244108. Cited by: §1.
  • [26] Y. Zhang, Z. J. Xu, T. Luo, and Z. Ma (2019) Explicitizing an implicit bias of the frequency principle in two-layer neural networks. arXiv preprint arXiv:1905.10264. Cited by: §1.