MIM: A deep mixed residual method for solving high-order partial differential equations

06/07/2020 ∙ by Liyao Lyu, et al. ∙ Michigan State University 0

In recent years, a significant amount of attention has been paid to solve partial differential equations (PDEs) by deep learning. For example, deep Galerkin method (DGM) uses the PDE residual in the least-squares sense as the loss function and a deep neural network (DNN) to approximate the PDE solution. In this work, we propose a deep mixed residual method (MIM) to solve PDEs with high-order derivatives. In MIM, we first rewrite a high-order PDE into a first-order system, very much in the same spirit as local discontinuous Galerkin method and mixed finite element method in classical numerical methods for PDEs. We then use the residual of first-order system in the least-squares sense as the loss function, which is in close connection with least-squares finite element method. For aforementioned classical numerical methods, the choice of trail and test functions is important for stability and accuracy issues in many cases. MIM shares this property when DNNs are employed to approximate unknowns functions in the first-order system. In one case, we use nearly the same DNN to approximate all unknown functions and in the other case, we use totally different DNNs for different unknown functions. In most cases, MIM provides better approximations (not only for high-derivatives of the PDE solution but also for the PDE solution itself) than DGM with nearly the same DNN and the same execution time, sometimes by more than one order of magnitude. When different DNNs are used, in many cases, MIM provides even better approximations than MIM with only one DNN, sometimes by more than one order of magnitude. Therefore, we expect MIM to open up a possibly systematic way to understand and improve deep learning for solving PDEs from the perspective of classical numerical analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Solving partial differential equations (PDEs) has been the most ubiquitous tool to simulate complicated phenomena in applied sciences and engineering problems. Classical numerical methods include finite difference method LeVeque (2007), finite element method (FEM) Elman et al. (2014), discontinuous Galerkin method Cockburn et al. (2000), and spectral method Shen et al. (2011), which are typically designed for low dimensional PDEs and are well understood in terms of stability and accuracy. However, there are high dimensional PDEs such as Schrödinger equation in the quantum many-body problem Dirac (1981), Hamilton-Jacobi-Bellman equation in stochastic optimal control Bardi and Capuzzo-Dolcetta (2008), and nonlinear Black-Scholes equation for pricing financial derivatives Hull (2009)

. Solving these equations is far out of the capability of classical numerical methods due to the curse of dimensionality, i.e., the number of unknowns grows exponentially fast as the dimension increases.

Until very recently, deep-learning based methods have been developed to solving these high-dimensional PDEs; see E et al. (2017); Giuseppe and Matthias (2017); E and Yu (2018); Han et al. (2018); Raissi (2018); Sirignano and Spiliopoulos (2018); Hutzenthaler et al. (2019); Raissi et al. (2019); Beck et al. (2019); Cervera (2019); Fan et al. (2019); Khoo et al. (2019); Beck et al. (2020); Wang et al. (2020); Zang et al. (2020); Discacciati et al. (2020) for examples. Typically, there are three main ingredients (stages) of a deep-learning method for solving PDEs: (1) modeling: the loss (objective) function to be optimized; (2) architecture: the deep neural network (DNN) for function approximation; (3) optimization: the optimal set of parameters in the DNN which minimizes the loss function. By design, the number of parameters in DNNs grows at most polynomially in terms of dimension. Meanwhile, possibly high-dimensional integrals in the loss function are approximated by Monte-Carlo method. Therefore, by design, deep learning overcomes the curse of dimensionality. In practice, deep learning performs well for Schrödinger equation Giuseppe and Matthias (2017); Han et al. (2019), Hamilton-Jacobi-Bellman equation Han et al. (2018); E et al. (2017), and nonlinear Black-Scholes equation Beck et al. (2019); Cervera (2019).

Typically, deep learning solves a PDE in the following way. For the given PDE, the loss function is modeled as the equation residual in the least-squares sense Sirignano and Spiliopoulos (2018) or the variational form if exists E and Yu (2018). ResNet is often used as the network architecture He et al. (2015)

, which was tested to overcome the notorious problem of vanishing/exploding gradient. Afterwards, stochastic gradient descent method is used to find the optimal set of parameters in ResNet which minimizes the loss function. ResNet with the optimal set of parameters gives an approximation of the PDE solution.

In this work, we propose a deep mixed residual method (MIM) for solving high-order PDEs. In the modeling stage, by rewriting a given PDE into a first-order system, we obtain a larger problem in the sense that both the PDE solution and its high-order derivatives are unknown functions to be approximated. This has analogs in classical numerical methods, such as local discontinuous Galerkin method Cockburn et al. (2000) and mixed finite element method Boffi et al. (2013)

. Compared to DGM, there are two more degrees of freedom in MIM:

  • In the loss function stage, one can choose different high-order derivatives into the set of unknown functions. Take biharmonic equation as an example. The set of unknown functions can include the PDE solution and its derivatives up to the third order, or only contain the PDE solution and its second-order derivatives, and both choices have analogs in discontinuous Galerkin method Yan and Shu (2002); Cockburn et al. (2009). We then write the loss function as the sum of equation residuals in the least-squares sense, very much in the same spirit as the least-squares finite element method Bochev and Gunzburger (2015).

  • In the architecture stage, one can choose the number of networks to approximate the set of unknown functions. In one case, one DNN is used to approximate the PDE solution and other DNNs are used to approximate its high-order derivatives; in the other case, the PDE solution and its derivatives share nearly the same DNN.

These two degrees of freedom allow MIM to produce better approximations over DGM in all examples, including Poisson equation, Monge-Ampére equation, biharmonic equation, and Korteweg-de Vries (KdV) equation. In particular, MIM provides better approximations not only for the high-order derivatives but also for the PDE solution itself. It is worth mentioning that the usage of mixed residual in deep learning was first introduced for surrogate modeling and uncertainty quantification of a second-order elliptic equation Zhu et al. (2019) and was later adopted in a deep domain decomposition method Li et al. (2019).

The paper is organized as follows. In Section 2, we introduce MIM and DGM (for comparison purpose). In Section 3, numerical results for four types of high-order PDEs are provided. Conclusions and discussions are drawn in Section 4.

2 Deep mixed residual method

In this section, we introduce MIM and discuss its difference with DGM in terms of loss function and neural network structure.

2.1 Loss function

Consider a potentially time-dependent nonlinear PDE over a bounded domain

(1)

where denotes the boundary of . In DGM, the loss function is defined as the PDE residual in the least-squares sense

(2)

where and are penalty parameters given a priori. These three terms in (2) measure how well the approximate solution satisfies the PDE, the initial condition and the boundary condition, respectively.

In the absence of temporal derivatives, (1) reduces to

and the corresponding loss function in DGM becomes

(3)

Table 1 lists four PDEs with their corresponding loss functions in DGM and Table 2 lists different boundary conditions, the initial condition and their contributions to loss functions in DGM and MIM. More boundary conditions can be treated in this way. Interested readers may refer to Chen et al. (2020) for details.

Equation Explicit form Loss function
Poisson
Monge-Ampére
Biharmonic
KdV
Table 1: Loss functions for four types of PDEs in the deep Galerkin method.
Condition Explicit form Contribution to the loss function
Dirichlet
Neumann or
Initial
Table 2: Contributions to the loss function for the initial condition and different types of boundary conditions used in the deep Galerkin method and the deep mixed residual method.

In MIM, we first rewrite high-order derivatives into low-order ones using auxiliary variables. For notational convenience, auxiliary variables represent

(4)

For KdV equation, we have instead of the second formula in (4). With these auxiliary variables, we define loss functions for four types of PDEs in Table 3. Since one can choose a subset of high-order derivatives into the set of unknown functions, there are more than one loss function in MIM. For biharmonic equation, there are two commonly used sets of auxiliary variables in local discontinuous Galerkin method and weak Galerkin finite element method: one with all high-order derivatives Yan and Shu (2002) and the other with part of high-order derivatives Cockburn et al. (2009); Mu et al. (2015). Correspondingly, if all high-order derivatives are used, we denote MIM by MIM, and if only part of high-order derivatives are used, we denote MIM by MIM. In Section 2.2, we will discuss how to equip different loss functions with different DNNs. In short, if only one DNN is used to approximate the PDE solution and its derivatives, we denote MIM by MIM, and if multiple DNNs are used, we denote MIM by MIM. In Section 3, different loss functions listed in Table 1, Table 2 and Table 3 will be tested and discussed. By default, all the penalty parameters are set to be .

Equation Explicit form Loss function
Poisson
Monge-Ampére
Biharmonic
KdV
Table 3: Loss functions in the deep mixed residual method for four types of equations. Two different loss functions for biharmonic equation are denoted by MIM and MIM, in which all high-order derivatives or part of high-order derivatives are included, respectively.

2.2 Neural network architecture

ResNet He et al. (2015) is used to approximate the PDE solution and its high-order derivatives. It consists of blocks in the following form

(5)

Here , . is the depth of network, is the width of network, and

is the (scalar) activation function. Explicit formulas of activation functions used in this work are given in Table

4. The last term on the right-hand side of (5

) is called the shortcut connection or residual connection. Each block has two linear transforms, two activation functions, and one shortcut; see Figure

1 for demonstration. Such a structure can automatically solve the notorious problem of vanishing/exploding gradient He et al. (2016).

Activation function Formula
Square
ReLU
ReQU
ReCU
Table 4: Activation functions used in numerical tests.
Figure 1: One block of ResNet. A deep neural network contains a sequence of blocks, each of which consists of two fully-connected layers and one shortcut connection.

Since is in rather than

, we can pad

by a zero vector to get the network input

. A linear transform can be used as well without much difference. Meanwhile, has outputs which cannot be directly used for the PDE solution and its derivatives employed in the loss function. Therefore, a linear transform is applied to to transform it into a suitable dimension. Let be the whole set of parameters which include parameters in ResNet () and parameters in the linear transform . Note that the output dimension in MIM depends on both the PDE problem and the mixed residual loss. We illustrate network structures for biharmonic equation as an example in Figure 2.

Figure 2: Network structures for biharmonic equation with deep Galerkin method and deep mixed residual method. DGM only approximates solution . MIM approximate solution and . MIM approximates solution and all of its derivatives used in the equation . MIM uses four networks to approximate and MIM uses two networks to approximate . Each network has a similar structure with different output dimensions.

From Figure 2, we see that DGM has only output, MIM has outputs, and MIM has outputs. In Figure 3, we illustrate networks structures of MIM and MIM for Poisson equation. In MIM, two DNNs are used: one to approximate the solution and the other one to approximate its derivatives. It is clear from Figure 2 that network structures in DGM and MIM only differ in the output layer and thus they have comparable numbers of parameters to be optimized. To be precise, we calculate their numbers of parameters in Table 5, from which one can see the number of parameters in DGM and MIM is close. The number of parameters in MIM is nearly double for Poisson equation, Monge-Ampére equation and biharmonic equation (MIM), tripled for KdV equation, and quadrupled for biharmonic equation (MIM), respectively. In Section 3, from numerical results, we observe a better performance of MIM for all four equations, not only for derivatives of the PDE solution, but also for the solution itself.

Method Equation Size of the parameter set
DGM Four equations
MIM Poisson
Monge-Ampére
Biharmonic (MIM)
Biharmonic (MIM)
KdV
MIM Poisson
Monge-Ampére
Biharmonic (MIM)
Biharmonic (MIM)
KdV
Table 5: Number of parameters for different network structures used for different equations and different loss functions. , , and are the network width, the network depth, and the problem dimension, respectively. It is observed that the number of parameters in DGM and MIM is close, and the number of parameters in MIM is nearly double for Poisson equation, Monge-Ampére equation and biharmonic equation (MIM), tripled for KdV equation, and quadrupled for biharmonic equation (MIM), respectively.

2.3 Stochastic Gradient Descent

For completeness, we also briefly introduce stochastic gradient descent method. For the loss function defined in (3

), we generate two sets of points uniformly distributed over

and : in and on .

(6)

where is the learning rate chosen to be here. and are measures of and , respectively. is the DNN approximation of PDE solution parameterized by . Sampling points and are updated at each iteration. In implementation, we use ADAM optimizer Kingma and Ba (2014) and automatic differentiation Paszke et al. (2017)

for derivatives in PyTorch.

3 Numerical Result

In this section, we show numerical results of MIM for four types of equations. We use relative errors of , , , and defined in Table 6 for comparison. In all figures, relative errors are in scale.

Quantity DGM MIM
Table 6: Relative errors used in deep Galerkin method and deep mixed residual method.

3.1 Poisson Equation

Consider the following Neumann problem

(7)

with the exact solution . The neural network structure in DGM is the same as that for biharmonic equation shown in Figure 2. Following Table 1 and Table 2, we use the loss function for (7)

(8)

Since both and are explicitly used, one more advantage of MIM is the enforcement of boundary conditions. For (7), we multiply by to satisfy the Neumann boundary condition automatically; see Figure 3. DGM only has as its unknown function, and thus it is unclear that how the exact Neumann boundary condition can be imposed.

(a) MIM: one network to approximate the PDE solution and its derivatives.
(b) MIM: multiple networks to appriximate the PDE solution and its derivatives.
Figure 3: Detailed network structures of MIM and MIM to solve Poisson equation. DNN part is the same as that in Figure 2. are multipliers which make MIM and MIM satisfy the exact Neumann boundary condition.

Therefore, for DNNs in Figure 3, the loss function in MIM can be simplified as

(9)

We emphasize that Dirichlet boundary condition can be exactly imposed in DGM Berg and Nyström (2018) and no penalty term is needed. For Neumann boundary condition, mixed boundary condition, and Robin boundary condition, however, it is difficult to build up a DNN representation which satisfies the exact boundary condition. Building up a DNN approximation which satisfies the exact boundary condition can have a couple of advantages Chen et al. (2020): 1) make ease of the training process by avoiding unnecessary divergence; 2) improve the approximation accuracy; 3) save the execution time. In MIM, however, we have the direct access to both and . Therefore, all these boundary conditions can be imposed exactly in principle. This will be presented in a subsequent work Lyu et al. (2020).

For (7), average errors of and over the last iterations are recorded in Table 7. The network depth and the activation function is used. Network widths are for dimensional problems, respectively. Time is recorded as the average CPU time per iteration. It is not surprising that MIM costs less time than DGM since the DNN approximation in MIM satisfies the Neumann boundary condition automatically and both methods have similar network structures. It is surprising that MIM costs less time than DGM since the number of parameters in MIM is about twice of that in DGM. In terms of execution time, MIM MIM DGM.

d Method Relative error () Time (s)
2 DGM 0.3676 0.3714 0.04374
MIM 0.2941 0.1639 0.02925
MIM 0.0565 0.0236 0.03514
4 DGM 1.0022 1.3272 0.07455
MIM 0.3751 0.3290 0.03603
MIM 0.2294 0.0690 0.04141
8 DGM 2.0022 2.6551 0.13081
MIM 0.9049 0.6423 0.06642
MIM 0.7261 0.1499 0.08716
16 DGM 3.9796 5.0803 0.25621
MIM 1.7631 1.0041 0.11082
MIM 0.0787 0.0236 0.15125
Table 7: Relative errors for and in DGM and MIM for Poisson equation defined in (7).

Figure 4 and Figure 5 plot training processes of DGM and MIM in terms of relative errors for and . Generally speaking, in terms of approximation error, MIM MIM DGM as expected. Therefore, MIM provides a better strategy over DGM. MIM provides better approximations in terms of relative errors for both and . For , the improvement of MIM over DGM is about several times and that of MIM over MIM is about one order of magnitude. For , the improvement is about several times. Moreover, a dimensional dependence is observed for both and . The higher the dimension is, the better the approximation is.

(a) 4D
(b) 8D
(c) 16D
Figure 4: Relative error of in terms of iteration number for Poisson equation defined in (7).
(a) 4D
(b) 8D
(c) 16D
Figure 5: Relative error of in terms of iteration number for Poisson equation defined in (7).

Table 8 records approximation errors of MIM and DGM in terms of activation function and network depth when . MIM provides better approximations for both and . It is not surprising that ReLU is not a suitable function for DGM due to high-order derivatives, but is suitable in MIM since only first-order derivatives are present in MIM.

Relative error ()
DGM MIM MIM
ReLU 1 0.9197 0.9259 0.0890 0.0444 0.0264 0.0080
2 0.9210 0.9230 0.0245 0.0104 0.0265 0.0068
3 0.9208 0.9216 0.0258 0.0113 0.0258 0.0084
ReQU 1 0.0684 0.1003 0.0182 0.0127 0.0107 0.0042
2 0.0057 0.0118 0.0113 0.0047 0.0049 0.0017
3 0.0124 0.0140 0.0040 0.0029 0.0042 0.0031
ReCU 1 0.4642 0.4644 0.0288 0.0159 0.0100 0.0033
2 0.0281 0.0170 0.0071 0.0055 0.0048 0.0013
3 0.0028 0.0031 0.0049 0.0036 0.0049 0.0013
Table 8: Performance of MIM and DGM with respect to network depth and activation function for Poisson equation when . Network width is fixed to be .

3.2 Monge-Ampére equation

Consider the nonlinear Monge-Ampére equation

(10)

with the exact solution defined as . Following Table 1, 3 and 2, we have the loss function in DGM

and the loss function in MIM

respectively. For (10), the Dirichlet boundary condition can be enforced for both DGM and MIM. For comparison purpose, instead, we have the penalty term in both DGM and MIM. However, imposing exact boundary conditions is always encouraged in practice.

In this example, we fix the network depth and the activation function as . Relative errors in the last iterations with respect to the network width in different dimensions are recorded in Table 9. Figure 6 plots errors in terms of network width for different dimensions. The advantage of MIM is obvious from these results.

d Relative error ()
DGM MIM MIM
2 10 0.1236 0.7430 0.1023 0.3433 0.1251 0.5218
20 1.1100 3.1940 0.0922 0.3804 0.0784 0.0221
30 0.0913 0.5656 0.0522 0.1740 0.1075 0.0219
4 20 0.0981 0.7764 0.1095 0.6359 0.1230 0.3977
30 0.0921 0.7731 0.0903 0.4399 0.1063 0.2802
40 0.0943 0.6174 0.0636 0.3127 0.1287 0.2480
8 30 0.3584 3.3902 0.1435 1.6318 0.1155 0.5170
40 0.1179 1.4663 0.1344 1.0721 0.1330 0.4873
50 0.0997 1.2483 0.0977 0.8289 0.0917 0.4174
Table 9: Relative errors in the last iterations with respect to the network width for Monge-Ampére equation defined in (10) for different dimensions. The network depth is fixed to be and the activation function is fixed to be .
(a) 2D
(b) 4D
(c) 8D
(d) 2D
(e) 4D
(f) 8D
Figure 6: Relative errors of and for Monge-Ampére equation defined in (7).

3.3 Biharmonic equation

Consider the biharmonic equation

(11)

with the exact solution over . The loss function in DGM is

The loss function in MIM is

(12)

and the loss function in MIM is

(13)

Again, we can enforce the exact boundary condition in MIM but cannot enforce it in DGM. For comparison purpose, we use penalty terms in both methods.

Set and when , respectively. Table 10 records averaged errors in the last 1000 iterations.

d Method Relative error ( ) Time (s)
2 DGM 0.1656 0.6454 1.2333 8.8001 0.1034
MIM 0.1501 0.1929 0.1564 0.3067 0.1219
MIM 0.0769 0.1155 0.1504 0.4984 0.1636
MIM 0.0526 0.2066 0.2937 1.6821 0.1393
MIM 0.0424 0.1417 0.3625 2.2231 0.2164
4 DGM 0.1330 0.6454 1.2333 8.8008 0.3292
MIM 0.4117 0.1929 0.1563 0.3066 0.2784
MIM 0.0845 0.1155 0.1504 0.4984 0.4692
MIM 0.1039 0.2066 0.2937 1.6821 0.2883
MIM 0.1111 0.1417 0.3625 2.2301 0.5919
8 DGM 0.2488 1.0514 1.4594 13.4003 0.3292
MIM 0.3719 2.3855 0.6797 3.1015 0.2784
MIM 0.1856 0.6909 0.7840 4.7209 0.4692
MIM 0.1475 1.6657 1.2922 6.9594 0.8051
MIM 0.2881 0.9223 0.9981 6.4658 6.5148
Table 10: Relative errors for biharmonic equation defined in (11). MIM and MIM represent MIM with loss functions defined in (12) and (13), respectively.

Relative errors for , , and in terms of iteration number are plotted in Figure 7 when .

(a)
(b)
(c)
(d)
Figure 7: Relative errors of , , , in terms of iteration number for biharmonic equation. Both the solution and its derivatives are approximated by the same network in MIM, while different networks are used for the solution and its derivatives in MIM. MIM means all derivatives are approximated and MIM means only a subsect of derivatives ( here) are approximated.

Generally speaking, MIM provides better approximations for , , , and than DGM. For MIM and MIM, MIM has a slightly better approximation accuracy comparable to that of MIM, although MIM has more outputs. These results are of interests since they are connected with results of local discontinuous Galerkin method that the formulation with a subset of derivatives has a better numerical performance Yan and Shu (2002); Cockburn et al. (2009). We point out that MIM has the advantage that the exact boundary condition can be enforced, although we use penalty terms for this example.

3.4 KdV equation

Consider a time-dependent linear KdV-type equation

(14)

defined over , where the exact solution . We first rewrite it into the first-order system

The loss function in DGM is

Here is the standard basis set of . The loss function in MIM is

Relative errors of , , and are recorded in Table 11. Again, as shown in previous examples, MIM provides better results compared to DGM, especially for ReQU activation function. No obvious improvement of MIM over MIM is observed.

Method Relative error ()
1 ReQU DGM 34.9171 20.6788 34.3661
MIM 0.5705 5.3709 0.5369
MIM 1.2920 0.8129 1.9244
ReCU DGM 0.7603 0.4785 0.5977
MIM 0.0991 0.7313 0.0128
MIM 0.5035 0.5804 0.1229
2 ReQU DGM 84.8708 85.8114 85.8954
MIM 2.9393 1.9996 2.9443
MIM 2.1820 2.5591 2.1383
ReCU DGM 2.5483 2.1856 2.4431
MIM 1.5410 2.3865 1.5645
MIM 5.5900 5.7440 5.8957
3 ReQU DGM 168.1755 168.1697 169.3528
MIM 4.0421 4.0987 3.8496
MIM 7.7027 8.8787 9.1058
ReCU DGM 1.9132 1.4846 1.7970
MIM 1.5410 2.3865 1.5645
MIM 5.5900 5.7440 5.8957
Table 11: Relative errors for KdV equation defined in (14).

4 Conclusion and Discussion

Motivated by classical numerical methods such as local discontinuous Galerkin method, mixed finite element method, and least-squares finite element method, we develop a deep mixed residual method to solve high-order PDEs in this paper. The deep mixed residual method inherits several advantages of classical numerical methods:

  • Flexibility for the choice of loss function;

  • Larger solution space with flexible choice of deep neural networks;

  • Enforcement of exact boundary conditions;

  • Better approximations of high-order derivations with almost the same cost.

Meanwhile, the deep mixed residual method also provides a better approximation for the PDE solution itself. These features make deep mixed residual method suitable for solving high-order PDEs in high dimensions.

Boundary condition is another issue which is important for solving PDEs by DNNs. Enforcement of exact boundary conditions not only makes the training process easier, but also improves the approximation accuracy; see Berg and Nyström (2018); Chen et al. (2020) for examples. The deep mixed residual method has the potential for imposing exact boundary conditions such as Neumann boundary condition, mixed boundary condition, and Robin boundary condition. All these conditions cannot be enforced exactly in deep Galerkin method. This shall be investigated in a subsequent work Lyu et al. (2020).

So far, in the deep mixed residual method, only experiences from classical numerical methods at the basic level are transferred into deep learning. We have seen its obvious advantages. To further improve the deep mixed residual method, we need to transfer our experiences from classical numerical analysis at a deeper level. For example, the choice of solution space relies heavily on the choice of residual in order to maximize the performance of least-squares finite element method Bochev and Gunzburger (2015). Many other connections exist in discontinuous Galerkin method Cockburn et al. (2000) and mixed finite element method Boffi et al. (2013). For examples, since only first-order derivatives appear in the deep mixed residual method, ReLU works well for all time-independent equations we have tested but does not work well for KdV equation. Therefore, it deserves a theoretical understanding of the proposed method in the language of linear finite element method He et al. (2018). Another possible connection is to use the weak formulation of the mixed residual instead of least-squares loss, as done in deep learning by Zang et al. (2020) and in discontinuous Galerkin method by Cockburn et al. (2000). Realizing these connections in the deep mixed residual method will allow for a systematic way to understand and improve deep learning for solving PDEs.

5 Acknowledgments

This work was supported by National Key R&D Program of China (No. 2018YFB0204404) and National Natural Science Foundation of China via grant 11971021. We thank Qifeng Liao and Xiang Zhou for helpful discussions.

References

  • M. Bardi and I. Capuzzo-Dolcetta (2008) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer Science & Business Media. Cited by: §1.
  • C. Beck, W. E, and A. Jentzen (2019) Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. Journal of Nonlinear Science 29 (4), pp. 1563–1619. Cited by: §1.
  • C. Beck, L. Gonon, and A. Jentzen (2020) Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations. arXiv preprint arXiv:2003.00596. External Links: 2003.00596, Link Cited by: §1.
  • J. Berg and K. Nyström (2018) A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, pp. 28–41. External Links: ISSN 0925-2312, Link, Document Cited by: §3.1, §4.
  • P. Bochev and M. Gunzburger (2015) Least Squares Finite Element Methods. Springer, Berlin, Heidelberg. External Links: Document Cited by: item 1, §4.
  • D. Boffi, F. Brezzi, and M. Fortin (2013) Mixed Finite Element Methods and Applications. Springer, Berlin, Heidelberg. External Links: ISSN 0179-3632, ISBN 978-3-642-36518-8, Document Cited by: §1, §4.
  • J. A. G. Cervera (2019) Solution of the black-scholes equation using artificial neural networks. Journal of Physics: Conference Series 1221, pp. 012044. Cited by: §1.
  • J. Chen, R. Du, and K. Wu (2020) A comprehensive study of boundary conditions when solving PDEs by DNNs. arXiv preprint arXiv:2005.04554. External Links: 2005.04554, Link Cited by: §2.1, §3.1, §4.
  • B. Cockburn, B. Dong, and J. Guzman (2009) A hybridizable and superconvergent discontinuous galerkin method for biharmonic problems. Journal of Scientific Computing 40 (1), pp. 141–187. Cited by: item 1, §2.1, §3.3.
  • B. Cockburn, G. E. Karniadakis, and C. Shu (2000) Discontinuous Galerkin Methods - Theory, Computation and Applications. Springer-Verlag Berlin Heidelberg. External Links: ISBN 978-3-642-64098-8, Document Cited by: §1, §1, §4.
  • P. A. M. Dirac (1981) The principles of quantum mechanics. Oxford university press. Cited by: §1.
  • N. Discacciati, J. S. Hesthaven, and D. Ray (2020) Controlling oscillations in high-order discontinuous galerkin schemes using artificial viscosity tuned by neural networks. Journal of Computational Physics 409, pp. 109304. Cited by: §1.
  • W. E, J. Han, and A. Jentzen (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics 5 (4), pp. 349–380. Cited by: §1.
  • W. E and B. Yu (2018) The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. External Links: ISSN 2194-671X, Document Cited by: §1, §1.
  • H. Elman, D. Silvester, and A. Wathen (2014) Finite Elements and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics. Oxford University Press. External Links: ISBN 978-019178074-5, Document Cited by: §1.
  • Y. Fan, L. Lin, L. Ying, and L. Zepeda-Núnez (2019) A multiscale neural network based on hierarchical matrices. Multiscale Modeling & Simulation 17 (4), pp. 1189–1213. Cited by: §1.
  • C. Giuseppe and T. Matthias (2017) Solving the quantum many-body problem with artificial neural networks. Science 355 (6325), pp. 602–606. Cited by: §1.
  • J. Han, A. Jentzen, and W. E (2018) Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences of the United States of America 115 (34), pp. 8505–8510. Cited by: §1.
  • J. Han, L. Zhang, and W. E (2019) Solving many-electron schrödinger equation using deep neural networks. Journal of Computational Physics 399, pp. 108929. Cited by: §1.
  • J. He, L. Li, J. Xu, and C. Zheng (2018) ReLU deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973. Cited by: §4.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep residual learning for image recognition. CoRR 1512.03385. External Links: Link, 1512.03385 Cited by: §1, §2.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition.

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    2, pp. 770–778.
    Cited by: §2.2.
  • C. J. Hull (2009) Options, futures and other derivatives. Upper Saddle River, NJ: Prentice Hall,. Cited by: §1.
  • M. Hutzenthaler, A. Jentzen, T. Kruse, and T. A. Nguyen (2019) A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. arXiv preprint arXiv:1901.10854. External Links: 1901.10854, Link Cited by: §1.
  • Y. Khoo, J. Lu, and L. Ying (2019) Solving for high-dimensional committor functions using artificial neural networks. Research in the Mathematical Sciences 6, pp. 1. Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.3.
  • R. J. LeVeque (2007) Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems. Society for Industrial and Applied Mathematics. Cited by: §1.
  • K. Li, K. Tang, T. Wu, and Q. Liao (2019) D3M: A Deep Domain Decomposition Method for Partial Differential Equations. IEEE Access 8, pp. 5283–5294. External Links: ISSN 2169-3536, Document Cited by: §1.
  • L. Lyu, K. Wu, R. Du, and J. Chen (2020) Enforcing exact boundary and initial condtions in the deep mixed residual method. in preparation. Cited by: §3.1, §4.
  • L. Mu, J. Wang, and X. Ye (2015) A weak Galerkin finite element method with polynomial reduction. Journal of Computational and Applied Mathematics 285, pp. 45–58. External Links: ISSN 0377-0427, Document Cited by: §2.1.
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. Note: [Online; accessed 13. May 2020] External Links: Link Cited by: §2.3.
  • M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
  • M. Raissi (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations. Journal of Machine Learning Research 19 (1), pp. 932–955. Cited by: §1.
  • J. Shen, T. Tang, and L. Wang (2011) Spectral methods: algorithms, analysis and applications. Vol. 41, Springer Science & Business Media. Cited by: §1.
  • J. A. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339–1364. Cited by: §1, §1.
  • Y. Wang, S. W. Cheung, E. T. Chung, Y. Efendiev, and M. Wang (2020) Deep multiscale model learning. Journal of Computational Physics 406, pp. 109071–109071. Cited by: §1.
  • J. Yan and C. Shu (2002) Local Discontinuous Galerkin Methods for Partial Differential Equations with Higher Order Derivatives. Journal of Scientific Computing 17 (1), pp. 27–47. External Links: ISSN 1573-7691, Document Cited by: item 1, §2.1, §3.3.
  • Y. Zang, G. Bao, X. Ye, and H. Zhou (2020) Weak adversarial networks for high-dimensional partial differential equations. Journal of Computational Physics 411, pp. 109409. Cited by: §1, §4.
  • Y. Zhu, N. Zabaras, P. Koutsourelakis, and P. Perdikaris (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. External Links: ISSN 0021-9991, Document Cited by: §1.