During the past decade, neural network is gaining increasingly more attention from the big data and artificial intelligence communityGoodfellow et al. (2016); LeCun et al. (2015); Schmidhuber (2015), and playing a central role in a broad range of applications related to, e.g., image processing Razzak et al. (2018)2018)2018). However, in scientific and engineering computing, the studying of neural networks is still at an early stage. Recently, thanks to the introduction of deep neural networks, successes have been achieved in a series of challenging tasks, including turbulence modeling Ling et al. (2016); Duraisamy et al. (2019), molecular dynamics simulations Zhang et al. (2018); Li et al. (2016), and solutions of stochastic and high-dimensional partial differential equations E et al. (2017); Han et al. (2018); Raissi (2018); Beck et al. (2019). In addition to that, a great deal of efforts are also made to build the connections between neural networks and traditional numerical methods such as finite element He et al. (2019), wavelet Fan et al. (2019), multigrid He and Xu (2019), hierarchical matrices Fan et al. (2019a, b) and domain decomposition Li et al. (2019) methods. However, it is still far from clear whether neural networks can solve ordinary/partial differential equations and truly exhibit advantages as compared to classical discretization schemes in accuracy, flexibility and robustness.
An early attempt on utilizing neural network methods to solve differential equations can be traced back to three decades ago Lee and Kang (1990), in which a Hopfield neural network was employed to represent the discretized solution. Shortly afterwards, methodologies to construct closed form numerical solutions with neural networks were proposed Meade Jr and Fernandez (1994); van Milligen et al. (1995). Since then, more and more efforts were made in solving differential equations with various types of neural networks, such as feedforward neural network Lagaris et al. (1998, 2000); McFall and Mahan (2009); McFall (2013), radial basis network Jianyu et al. (2002); Mai-Duy and Tran-Cong (2001), finite element network Ramuhalli et al. (2005), cellular network Chua and Yang (1988), and wavelet network Li et al. (2010). These early studies have in certain ways illustrated the feasibility and effectiveness of neural network based methods, but are limited to handling model problems with regular geometry in low space dimensions. Moreover, the networks utilized in these works are relatively shallow, usually containing only one hidden layer, whereas the potential merits of the deep neural networks are not fully revealed.
Nowadays, with the advent of the deep learning techniqueNielsen (2015); Aggarwal (2018); Higham and Higham (2019), neural networks with substantially more hidden layers have become valuable assets. Various challenging problems modeled by complex differential equations have been taken into considerations and successfully handled by neural network based methods E and Yu (2018); Liao and Ming (2019); Zang et al. (2020); Raissi et al. (2020, 2019); Sirignano and Spiliopoulos (2018); Berg and Nyström (2018); Long et al. (2019); Kani and Elsheikh (2017); Khoo et al. (2017). These remarkable progresses have demonstrated the advantages of deep neural networks in terms of the strong representation capability. However, the effectiveness of the neural network based methods are still hindered by several factors. Specifically, the accuracy of the neural network based approximation as well as the efficiency of the training task usually have a strong dependency on the properties of the problems, such as the nonlinearity of the equation, the irregularity of the solution, the shape of the boundary, the dimension of the domain, among other factors. It is therefore of paramount importance to improve the robustness and flexibility of the neural network based approaches. For instance, several methods E and Yu (2018); Liao and Ming (2019); Zang et al. (2020); Li et al. (2019)
have been recently proposed to transform the original problem into a corresponding weak form, thus lowering the smoothness requirement and possibly reducing the cost of the training process. Unfortunately, in these works extra penalty terms due to the boundary constraints are included in the loss function, which could eventually have a negative effect on the training process and the achievable accuracy. On the other hand, a number of effortsLagaris et al. (1998, 2000); McFall and Mahan (2009); McFall (2013) were made to explicitly manufacture the numerical solution so that the boundary conditions can be automatically satisfied. However, the construction processes are usually limited to problems with simple geometries, not flexible enough to generalize to arbitrary domains with arbitrary boundary conditions.
In this work, we propose PFNN, a penalty-free neural network method, for solving a class of second-order boundary-value problems on complex geometries. In order to reduce the smoothness requirement, we reformulate the original problem to a weak form so that the approximation of high-order derivatives of the solution is avoided. In order to adapt with various boundary constrains without adding any penalty terms, we employ two networks, rather than just one, to construct the approximate solution, with one network satisfying the essential boundary conditions and the other handling the rest part of the domain. And in order to disentangle the two networks, a length factor function is introduced to eliminate the interference between them so as to make the method applicable to arbitrary geometries. We prove that the proposed method converges to the true solution of the boundary-value problem as the number of hidden units of the networks increases, and show by numerical experiments that PFNN can be applied to a series of linear and nonlinear second-order boundary value problems. Compared to existing approaches, the proposed method is able to produce more accurate solutions with fewer unknowns and less training costs.
The remainder of the paper is organized as follows. In Section 2, the basic framework of the PFNN method is presented. Following that we provide some theoretical analysis results on the accuracy of the PFNN method in Section 3. Some further comparisons between the present work and several recently proposed methods can be found in Section 4. We report numerical results on various second-order boundary value problems in Section 5. The paper is concluded in Section 6.
2 The PFNN method
Consider the following boundary-value problem:
where is the outward unit normal, and .
It is well understood that under certain conditions (1) can be seen as the Euler-Lagrange equation of the energy functional
Introducing a hypothesis space constructed with neural networks, the approximate solution is obtained by solving the following minimization problem:
is the loss function representing the discrete counterpart of , denotes a set of sampling points on , and is the size of .
In this work, the hypothesis space is not an arbitrary space formed by neural networks. Instead, it is designed to encapsulate the essential boundary conditions. To construct the approximate solution , we employ two neural networks, and , instead of one, such that
where is the collection of the weights and biases of the two networks, and is a length factor function for measuring the distance to , which satisfies
With the help of the length factor function, the neural networks and are utilized to approximate the true solution on the essential boundary and the rest of the domain, respectively. The training of and are conducted separately, where is trained on the essential boundary to minimize the following energy functional
and is trained to approximate on the rest of the domain by minimizing the loss function (5) By this means, would not produce any influence on , and the interference between the two networks are eliminated.
To construct the length factor function , the boundary of the domain is divided into segments: (, ), with each segment either belonging to or to . For any boundary segment , select another segment (, ), which is not the neighbor of , as its companion. After that, for each , we construct a spline function that satisfies
Then is defined as:
Here a hyper-parameter is introduced to adjust the shape of . A suggested value is , i.e., the number of the boundary segments on , since in this way the average value of is kept in a proper range and would not decrease dramatically with the increase of .
For the purpose of flexibility, we employ the radial basis interpolationWright (2003), among other choices, to construct . Taking distinct points as the interpolation nodes, is defined as:
is an inverse multiquadric radial basis function. Here the parameteris used to adjust the shape of , To maintain scale invariance, which is important for solving differential equations, we set , where is the radius of the minimal circle that encloses all the interpolation nodes. Similar configurations of have been suggested in several previous studies such as Franke (1982); Foley (1987). The coefficients and are determined by solving a small linear system defined by the following constraints:
In particular, if and are two parallel hyper-planes, is reduced to a linear polynomial, which can be derived analytically without interpolation.
To further illustrate how the length factor function is constructed, consider an example that is a slotted disk as shown in Figure 1. Suppose the whole boundary is divided into four segments such that and , where and are the opposite sides of and , respectively. One can build and according to (11)-(13) and then construct with (10).
3 Theoretical analysis of PFNN method
In this section, we provide a theoretical proof to verify that under certain conditions the approximate solution obtained by the PFNN method is able to converge to the true solution of the boundary-value problem (1) as the number of hidden units in the neural networks and increases.
We suppose that the true solution of (1) belongs to the function space
where , . Further, it is assumed that the function in (1) satisfies the following conditions:
is strictly increasing on and .
there exist constants , and , such that for all , .
is Hölder continuous. If , there exists a constant , such that
otherwise, there exist constants , and , such that
If , there exist constants , and , such that
otherwise, there exists a constant , such that
We list two useful lemmas here for subsequent usage, among which the second one is the famous universal approximation property of the neural networks.
Especially, if , then . In this case, there exists a constant such that . Then we have
where , if and if .
Lemma 2 (K. Hornik, 1991 Hornik (1991)) Let
be the space consisting of all neural networks with single hidden layer of no more than hidden units, where , is the input weight, is the output weight and is the bias. If is nonconstant and all its derivatives up to order are bounded, then for all and , there exists an integer and a function , such that
The main convergence theorem of the PFNN method is given below.
Theorem 1 Suppose that the energy functional is in the form of (2), , , , where , then
Proof For the sake of brevity, we drop the subscripts in , and and utilize , and to represent them, respectively. We only need to consider the case of since the proof for the case of is analogous. Combining the Friedrichs’ inequality with Lemma 1, we have
where , are constants.
First, we prove that as , for arbitrary . According to Lemma 2, for all , there exists an integer and a function satisfying . Due to , the following relationship hold:
We then prove that as , for arbitrary . Since , , and , in , we have . According to Lemma 2, for all and , there exists an integer , which is dependent on and therefore relies on , and a function , such that
Correspondingly, the function satisfies
It then follows that
Finally, we make use of reduction to absurdity to prove that can be arbitrarily small so long as and are large enough. Suppose that for all and , there exists a constant , such that . If , according to (25) and (28), there exist integers and ( is dependent on ), such that and . Then we have
which is contradictory to (24). Hence, is not a lower bound of . Similarly, if , there exist an integer and a corresponding , such that , and
which is also in contradiction with (24). Therefore we can conclude that as and .
We remark there that although only neural networks with single hidden layers are considered here, similar conclusions can be made for the case of multi-layer neural networks. The details are omitted for brevity. It is also worth noting that the assumptions in the analysis are only sufficient conditions for the converge of the PFNN method. Later numerical experiments with cases that the assumptions are not fully satisfied will show that the proposed PFNN method still works well.
4 Comparison with other methods
Most of the existing neural network methodologies employ a penalty method to deal with the essential boundary conditions. For the boundary-value problem (1), a straightforward way is to minimize the following energy functional in least-squares form:
where and are penalty coefficients, which can also be seen as Lagrangian multipliers that help transform a constrained optimization problem into an unconstrained one.
Such least-squares approach has not only been adopted in many early studies van Milligen et al. (1995); Lagaris et al. (1998); McFall and Mahan (2009); Jianyu et al. (2002); Mai-Duy and Tran-Cong (2001); Li et al. (2010), but also employed by a number of recent works based on deep neural networks. An example is the physics-informed neural networks Raissi et al. (2019) in which deep neural networks are applied to solve forward and inverse problems involving nonlinear partial differential equations arising from thermodynamic, fluid mechanics and quantum mechanics. Following it, in the work of the hidden fluid mechanics Raissi et al. (2020) impressive results are obtained by extracting velocity and pressure information from the data of flow visualizations. Another example of using least-squares energy functionals is the work of the Deep Galerkin method Sirignano and Spiliopoulos (2018), in which efforts are made to merge deep learning techniques with the Galerkin method for efficiently solving several high-dimensional differential equations. Overall, a major difficulty by using the least-squares approach is that approximations have to be made to the high-order derivatives of the true solution in some way, which could eventually lead to high training cost and low accuracy.
To avoid approximating high-order derivatives, several methods have been proposed to transform the original problem into the corresponding weak forms. Examples include the Deep Ritz method E and Yu (2018) which employs an energy functional of the Ritz type:
and the Deep Nitsche method Liao and Ming (2019) which is based on an energy functional in the sense of Nitsche:
The transformations to weak forms done in the two methods can effectively reduce the smoothness requirement of the approximate solution. However, the penalty terms due to the essential boundary condition still persist, which would lead to extra training cost. Moreover, there is little clue on how to set the penalty factor ; improper values will cause negative influence on the accuracy of the approximate solution or even lead to training failure.
There are also several attempts Lagaris et al. (1998, 2000); McFall and Mahan (2009); McFall (2013) made to eliminate the penalty terms by explicitly manufacturing an approximate solution in the following form:
where satisfies the essential boundary condition and serves as a length factor function. It is worth noting that the scopes of applications of these methods are rather narrow, despite that the approximate solutions do share some similarities with the PFNN method. In particular, these methods usually construct function either in analytic forms Lagaris et al. (1998, 2000), or through spline interpolations McFall and Mahan (2009); McFall (2013), and are therefore only suitable to simple geometries in low dimensions. To establish the length factor function , these methods usually rely on mapping the original domain to a hyper-sphere, which is again not flexible and efficient for problems with complex geometries and high dimensions.
The proposed PFNN method can effectively combine the advantages of the aforementioned state-of-the-art while overcoming their drawbacks. It reduces the smoothness requirements as well as removes the penalty term in the loss function, effectively converting the original hard problem to a relatively easy one. By introducing two neural networks, instead of only one, the approximations made to the true solutions are separated to the essential boundaries and the rest of the domain, respectively. The original training task is divided into two simpler ones, which can substantially reduce the training cost as well as enhance the accuracy of the approximation. To eliminate the interference between the two neural networks, a systematic approach is further proposed to construct the length factor function in a most flexible manner. As we will show later in the numerical experiments, the PFNN method is applicable to a wide range of problems on complex geometries in arbitrary dimensions and is able to achieve higher accuracy with lower training cost as compared with other approaches.
5 Numerical experiments
|2||minimal surface||2||Koch snowflake||0|
A series of numerical experiments are conducted to examine the numerical behaviors of the proposed PFNN method as well as several previous state-of-the-art approaches. We design four test cases covering different types of problems that have the same form of (1) but vary in and , as shown in Table 1. In particular, the computational domains of test cases 2 and 3 are further illustrated in Figure 2, both exhibiting very complex geometries. We employ the ResNet model He et al. (2016)
with sinusoid activation functions to build the neural networks in both PFNN and other neural network based methods. The Adam optimizerKingma and Ba (2014)
is utilized for training, with the initial learning rate set to 0.01. Unless mentioned otherwise, the maximum number of iteration is set to 5,000 epochs. The relative
-norm is used to estimate the error of the approximate solution. In all tests except those with the traditional finite element method, we perform ten independent tests and collect the results for further analysis.
5.1 Anisotropic diffusion equation on a square
The first test case is an anisotropic diffusion equation:
The corresponding energy functional to minimize is:
In the experiment, we set the essential boundary to and set the exact solution to various forms including , , , and , where , discontinuous as such, is a weak solution of the partial differential equation.
For comparison purpose, we examine the performance of several approaches including the linear finite element method, the least-squares neural network method, the Deep Ritz method, the Deep Nitsche method and our proposed PFNN method. A uniform mesh of cells is used by the bilinear finite element method, leading to
unknowns. For the penalty-based neural network methods, the network adopted is comprised of 4 ResNet blocks, each of which contains 2 fully connected layers with 10 units and a residual connection, resulted in totallyundecided parameters. And for the PFNN method, the network and consist of 1 and 3 ResNet blocks, respectively, corresponding to a total of parameters. The penalty coefficients in the penalty-based approaches are all set to three typical values: , and .
The experiment results are illustrated in Figure 3
. To get detailed information of the achievable accuracy of every approach, we draw in the figure the box-plots that contain the five-number summary including the smallest observation, lower quartile, median, upper quartile, and largest observation for each set of tests. From the figure, it can be observed that the performance of the classical least-squares neural network method is usually the worst, due in large part to the approximations made to high-oder derivatives. By introducing the corresponding weak forms, Deep Ritz and Deep Nitsche methods can deliver better results than the least-squares method does, but the performance still has a strong dependency on the specific values of the penalty coefficients and the performance is in many cases not competitive to the traditional finite element method. The advantages of the PFNN method are quite clear: it is able to outperform all above approaches in all tested problems in terms of the sustained accuracy and is much more robust than the penalty-based methods.
5.2 Minimal surface equation on a Koch snowflake
Consider a minimal surface equation Giusti and Williams (1984):
which represents the area of the surface of on .
The minimal surface problem is to find a surface with minimal area under the given boundary conditions. It can be verified that the catenoid surface
satisfies the equation (38) for all satisfying in . In particular, for the Koch snowflake, the condition becomes . The difficulty of the problem is increased as (especially, is unbounded on when ). To examine how well various methods can deal with such kind of problem, we set to a relatively challenging value: .
|Method||Deep Ritz||Deep Nitsche||PFNN|
In the experiment, we set the essential boundary to be the left half part of the boundary and gradually increase the fractal level of the Koch polygon. With increased, the number of segments on the boundary of the domain grows exponentially Koch (1904), posing severe challenges to classical approaches such as the finite element method. Thanks to the meshfree nature, neural network methods could be more suitable to tackle this problem. We investigate the performance of the Deep Ritz, Deep Nitsche and PFNN methods with , and
. The configurations of the neural networks are the same to those in the previous experiment. We report both mean values and standard deviations of the solution errors of all the approaches in Table2, from which we can see that PFNN can achieve the most accurate results with the least number of parameters and is much less susceptible to the change of domain boundary.
5.3 -Liouville-Bratu equation on a twisted torus
The next test case is a Dirichlet boundary-value problem governed by a -Liouville-Bratu equation:
where and . This equation can be transformed to a minimization problem of the following energy functional:
The computational domain is a twisted torus as shown in Figure 2 (b), for which the parameter equation of the boundary is:
, , , and .
When , the -Liouville-Bratu equation is degenerated to the well-known Liouville-Bratu equation Bratu (1914). The nonlinearity of the problem is increased as deviates from . In particular, if , the diffusivity as . And if (especially for ), the diffusivity grows dramatically as increases. In both cases, the corresponding -Liouville-Bratu equation is difficult to solve, let alone that the computational domain is also very complex. In addition, the nonlinearity of the problem is also raised as the Bratu parameter becomes larger. One thing worth noting is that here is a decreasing function. Arbitrarily large could make the solution of (41) not be the minimal point of the energy functional (42). However, if is restricted in a proper range, it is still feasible to obtain a solution by minimizing the equivalent optimization problem.
In the experiment, we set the exact solution to be and conduct two groups of tests to investigate the performance of various neural network methods in solving the -Liouville-Bratu equation (41) with parameter and , respectively. In each group of tests, we examine the influence of the Bratu parameter with and , respectively. Again, the configurations of the neural networks are the same to those in the previous experiment. The test results are reported in Table 3. From the table, we can clearly see that the proposed PFNN method can outperform the other two methods with less parameters used. In particular, for the case of , both Deep Ritz and Deep Nitsche suffer greatly from the high nonlinearity of the problem and could not produce accurate results, while PFNN can perform equally well with the change of both and .
|Method||Deep Ritz||Deep Nitsche||PFNN|
5.4 -Helmhotz equation on a 100D hypercube
In the last experiment, consider a mixed boundary-value problem governed by a -Helmholtz equation (, ):
where is a 100D hypercube and represents the Robin boundary. For this problem, the corresponding energy functional to minimize is:
In the experiment we set the exact solution to