Accelerating Physics-Informed Neural Network Training with Prior Dictionaries

04/17/2020 ∙ by Wei Peng, et al. ∙ Microsoft NetEase, Inc 49

Physics-Informed Neural Networks (PINNs) can be regarded as general-purpose PDE solvers, but it might be slow to train PINNs on particular problems, and there is no theoretical guarantee of corresponding error bounds. In this manuscript, we propose a variant called Prior Dictionary based Physics-Informed Neural Networks (PD-PINNs). Equipped with task-dependent dictionaries, PD-PINNs enjoy enhanced representation power on the tasks, which helps to capture features provided by dictionaries so that the proposed neural networks can achieve faster convergence in the process of training. In various numerical simulations, compared with existing PINN methods, combining prior dictionaries can significantly enhance convergence speed. In terms of theory, we obtain the error bounds applicable to PINNs and PD-PINNs for solving elliptic partial differential equations of second order. It is proved that under certain mild conditions, the prediction error made by neural networks can be bounded by expected loss of PDEs and boundary conditions.



There are no comments yet.


page 12

page 13

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As is known, neural networks are widely used to solve various scientific computing problems [10, 2, 23]. A sequence of recent works has applied neural networks to solve PDEs successfully [12, 19, 13]. We consider the following partial differential equation for the function :


the solution of which can be approximated by a neural network. Based on this motivation, Physics-Informed Neural Networks (PINNs) [20] construct such neural networks penalized by the discrepancy between the right-hand side (RHS) and the left-hand side (LHS) of problem (1.1

). To make the so-called physics information be learned by the neural networks, the loss function usually consists of three parts: partial differentiable structure loss (PDE loss), boundary value condition loss (BC loss), and initial value condition loss (IC loss). The structure of a PINN is shown in Figure


Figure 1: Illustration of a PINN

Denote a PINN as F being parameterized by . When there is no ambiguity, we regard IC as BC. Then expected total loss during training consists of two parts:


The expected PDE loss is


where random variable

is uniformly distributed on

. The expected BC loss is


where random variable is uniformly distributed on . The optimization problem of training is reformulated as follows:


To accelerate finding a minima of (1.5), [8]

considers adjusting general structures of networks and introduces a trainable variable to scale activation functions adaptively. Later, the subsequent work

[7] uses local adaptive activation functions. From the perspective of data-driven, the idea of leveraging prior structured information is also widely applied to training acceleration, such as wavelet representation[26], periodic structures[18], symplectic structures[21]

, energy preserving tensors

[11]. These methods employs specially designed neural networks and are applicable to a particular class of problems.

Motivated by these works, in this manuscript we introduce Prior Dictionary based Physics-Informed Neural Networks (PD-PINNs), which integrate prior information into PINNs to accelerate training. As is shown in Figure 2, compared with a PINN, a PD-PINN has an additional dictionary fusion layer, which combines prior information with the output layer of the neural network by the inner product. See section 2.1 for detailed description.

Figure 2: Illustration of a PD-PINN

The idea of PD-PINNs is mainly motivated by two aspects. On the one hand, our method is derived from the traditional spectral methods[6, 5], which decompose the ground truth over an orthogonal basis. The methods enjoy the guarantee of spectral convergence, and these basis functions can be regarded as a special type of prior dictionaries. Nevertheless, a finite basis expansion would result in truncation error. PD-PINNs utilize the universal approximation ability of neural networks to make up the truncation error, which can achieve high accuracy with a prior dictionary consisting of a small amount of basis functions. On the other hand, since the essence of training neural networks is to learn representation, we could embed priors into the network before training stages. Therefore, one natural way is to construct a prior dictionary based PINN to achieve the “pre-train”, thus to accelerate training.

Another issue of PINNs is the suspicious error bounds. There is no guarantee that small PDE loss and BC loss in (1.5) lead to a small total loss. To partially address this problem, we propose an error bound for PINNs solving elliptic PDEs of second order in the sense of under some mild conditions.

The main contribution of this manuscript is twofold.

  1. On the other hand, we propose a variant of PINNs, i.e., PD-PINNs, which employ prior information and accelerate training of neural networks. For various PDE problems defined on different types of domains, we construct corresponding prior dictionaries. The numerical simulations illustrate accelerated convergence of PD-PINNs. PD-PINNs can even recover the true solutions of some problems where PINNs hardly converge.

  2. On the one hand, we have proved that the error between the neural network and the ground truth in the sense of the infinity norm can be bounded by the two terms on the RHS of (1.5). Accordingly it is guaranteed that minimizing the RHS of (1.2) makes the neural network approach the true solution.

The rest of the manuscript is organized as follows. Section 2 introduces the method and provides the theoretical error bound of PINNs on elliptic PDE of second order. Four numerical simulations on synthetic problems are conducted in section 3. Section 4 concludes this manuscript.

2 Methodology

We first provide construction details of PD-PINNs in subection 2.1. Then we provide theoretical guarantees for PINN error bounds in section 2.2.

2.1 Pd-Pinn

Let be a neural network parameterized by , which is a mapping from into

. We employ a Multi-Layer Perceptron (MLP) with the activation function

, i.e.,


Then the parameter collection is . is the trainable part in our networks. Besides the part, we define the prior dictionary

as a vector-valued function, i.e.,

where are called word functions of dictionary . Thus prior information is encoded in these word functions. Combining the trainable part and the given prior, we formulate a PD-PINN as


the structure of which has several advantages:

  • Plug and play. Prior dictionaries are not integrated into the essential trainable neural networks so that there is no need to design special networks for learning the priors to solve various problems. Instead, only a designed dictionary in the fusion step should be updated.

  • Interpretation. Physics informed priors are fused with the uninterpretable network via the simple inner product operation, which falls into the area of generalized linear models, and the linear form can usually provide physical significance. For example, if we make the dictionary be a family of trigonometric functions , one may interpret the -th element of as the magnitude of certain frequency at position .

  • Flexible Prior selection. Since the dictionary and the essential neural network are independent before the final fusion, there is no restriction on the choice of dictionaries. A variety of word functions are available and can be flexibly selected for specific problems.

We can construct a dictionary based on the following considerations:

  1. Spatial-based dictionaries. This kind of dictionaries is designed to encode local magnitudes in word functions. For example, to solve equations on with a support , we may construct word functions with supports on .

  2. Frequency-based dictionaries.

    These dictionaries embed frequency priors into word functions such that neural networks enjoy the representation ability in both spatial and frequency domains. Since convergence in frequency domains seems vital in training neural networks

    [24, 14], the frequency-based dictionaries may accelerate training stages, especially for the periodic ground truth functions. Our numerical simulation employs this kind of dictionaries. We consider 1d Fourier basis in section 3.1 and 3.4. Two-dimensional Fourier basis is considered in section 3.2. Sphere harmonic basis is employed in section 3.3.

  3. Orthogonality. There is no mandatory orthogonal requirements on dictionaries. However, since we have not included any normalization techniques yet, dictionaries with orthogonal word functions are employed in our simulations for stability.

  4. Learnable dictionaries. Instead of assigning word functions manually, dictionary construction can also be driven by data. In practice, we may be required to solve the same equation several times with varying boundary value conditions. These solutions may share some common features and could be learned by Principle Component Analysis(PCA)[17], Nonnegative Matrix Factorization(NMF)[3, 1] and other dictionary learning techniques[16].

2.2 Error Bounds of PINNs

In this subsection, we provide an error bound on the discrepancy between a trained and the ground truth under mild assumptions. Consider equation (1.1) with the second order operator:

We denote and . If holds for some function on , we say is strictly elliptic on . In the following theorem, for simplicity, we suppose that is a uniformly lower bound of , is upper bounded, and over , where represents the closure of . Please refer to [4] for explicit explanation of the symbols used in this subsection.

Theorem 2.1 (Error bounds of PINNs on elliptic PDEs).

Suppose that is a bounded domain, is strictly elliptic and is a solution to (1.1). If the neural network satisfies that

  1. ;

  2. ;

  3. ,

then the error of over is bounded by


where is a positive constant depending on and .


Denote and . Since and fall in , then we have . Thanks to Theorem 3.7 in [4], we obtain

where is a positive constant depending only on and the diameter of . It immediately follows that inequality (2.2) holds. ∎

For Poisson’s equations, the second order operator degenerates into the Laplace operator , where , , and . Thus we have the following corollary:

Corollary 2.2.

Suppose that is a bounded domain, and the ground truth . If a neural network satisfies that

  1. ;

  2. ;

  3. ;

  4. lies between two parallel planes a distance apart,

then the error of over is bounded by


The proof is similar to Corollary 3.8 in [4], and we omit it. ∎

We discuss the assumptions in Theorem 2.1 and Corollary 2.2.

  • If we regard as an input-output black box, it seems impossible to verify whether satisfies condition (1) and (2). In practice, we can sample sufficient points in and

    to estimate the expected loss. Then the expected loss seems more reasonable than

    . In Theorem 2.4, we will give an error bound under the expectation sense instead of .

  • Conditions (1) and (2) imply that PINNs can solve elliptic PDEs stably with noises. Suppose that and . The error bound (2.2) then becomes

  • Condition (4) in Theorem 2.2 implies that a narrow region may reduce errors of PINNs.

In the following, we will measure the discrepancy via an expected loss function instead of to derive the error bound. To achieve the goal, we choose a smooth dictionary and smooth activation functions such that . Then we obtain that is -Lipschitz continuous on for some constant . We additionally assume that is also -Lipschitz continuous. Before we propose the final theorem, we construct a relationship between and in the following lemma:

Lemma 2.3.

Let be a domain. Define the regularity of as

where and is the Lebesgue measure of a set . Suppose that is bounded and . Let be an -Lipschitz continuous function on . Then


According to the definition of -Lipschitz continuity, we have

which follows


where . Without loss of generality, we assume that and . Denote that

It obvious that . Note that the Lebesgue measure of a hypersphere in with radius is . Then (2.5) becomes

which leads to (2.4). ∎

It is obvious that always holds. For various of domains in practice, we have . For example, a square domain has . For a circle domain , the regularity is lower bounded by .

If we adopt smooth activation functions such as and sigmoid, derivatives of neural networks are also smooth on . Therefore, the Lipschitz continuity of could be guaranteed. Further analysis and estimation of the Lipschitz property of neural networks can be found in [22].

Theorem 2.4 (Error bounds of PINNs on elliptic PDEs).

Suppose that is a bounded domain, is strictly elliptic and is a solution to (1.1). If neural network satisfies that

  1. where the random variable is uniformly distributed on ,

  2. where the random variable is uniformly distributed on ,

  3. are -Lipschitz continuous on ,

then the error of over is bounded by


where is a positive constant depending on and ,


Note that and . and are both -Lipschitz continuous on . Combining Theorem 2.1 and Lemma 2.3 leads to (2.6). ∎

In Theorem 2.4, we have proved that when the tractable training loss decreases to in the sense of expectation, the neural network approximates the ground truth.

3 Numerical Experiments

Our implementation is heavily inspired by the framework DeepXDE111 We also take the following standard technical settings in the numerical simulations:

  • Initialization. The initialization of might be vital, but this topic goes beyond our discussion. Instead, we employ the standard initialization. Each entry in and is uniformly and independently distributed on the interval .

  • Sampling. We focus on the comparison between PINNs[13] and PD-PINNs in this section, where uniform distribution is applied to all simulations. Indeed, one can employ more sophisticated sampling strategies with adaptive adjustment[13] in the training of both PINNs and PD-PINNs.

  • Optimizer. The popular optimizer Adam[9] with the learning rate of is employed in this section.

  • Loss. Since the multi layer neural networks are regarded as black-boxes in context, computing analytic forms of losses seems intractable. As illustrated in (1.3) and (1.4), in the -th iteration we estimate via the empirical loss function


    where are i.i.d. variables uniformly distributed on . We also estimate the via


    where are i.i.d. variables uniformly distributed on . Since it is also hard to track exact prediction error between and the ground truth during training stages, in the -th iteration we employ a Monte Carlo way to estimate via

    where are i.i.d. variables uniformly distributed on . We set in this section.

All simulations in this section are conducted with PyTorch. The code to reproduce all the results is available online


3.1 1d Poisson’s Equation

First, we consider a one dimensional Poisson’s equation with the Dirichlet boundary condition on both ends. Though the 1d problem seems simpler than its higher dimension versions, it is actually hard for neural networks to learn. The value of at an interior point is decided by two paths which connect the interior point and the two boundary ends. A slightly large error on one of the paths will result in large error in predictions of interior values.

Consider the ground truth:

which is smooth and has two different frequency components combining with a linear term. Its graph is shown in Figure 3.

Figure 3: Illustration of .

The corresponding 1d Poisson’s equation is formulated as follows:

We employ a frequency based dictionary with word functions:

Take , and the boundary value condition at two ends is included in the loss function in each iteration. The results are shown in Figure 3.2. PINNs implemented by MLPs fail to find the ground truth, though the curvature shares some similar tendency with . The failure might be caused by the propagation perturbation of boundary information. However, with dictionary integrated, the PD-PINNs have the ability to represent higher frequency even at initial iterations, and this ability might allow to broadcast information via the frequency domain instantly instead of gradual transmission through the spatial domain.

Figure 4: The first row shows the result of an MLP with 3 hidden layers. The second row displays the result of an MLP with 4 hidden layers. The third row employs a PD-PINN with 3 hidden layers and . The fourth row employs a PD-PINN with 3 hidden layers and . With the ground truth marked in red, the first column displays initial response curves while the second column shows response curves of after 1000 iterations.

3.2 2d Poisson’s Equation

Define the ground truth on :

The graph of is shown in Figure 5.

Figure 5: Illustration of

We formulate the 2d Poisson’s equation as

We construct a dictionary via

Take and . Setting , we have word functions in this dictionary. The result is shown in Figure 6. It is obvious that the PD-PINN outperforms the PINN on this problem.

Figure 6: The first row shows the result of an MLP with 4 hidden layers. The second row employs a PD-PINN with 3 hidden layers and dictionary . The first column displays the response surface after 100 iterations while the second column shows that of after 1000 iterations.

3.3 Spherical Poisson’s Equation

We consider the solution of Poisson’s equation on a sphere and PD-PINNs with the sphere Harmonic basis as a dictionary.

Let be a scalar function on a sphere, where the location of a point is indicated by colatitude and longitude . We employs the special form[25] in the experiment. Let the ground truth be

Its Mercator projection is displayed in Figure 7.

Figure 7: The Mercator projection of

We formulate the Poisson’s equation on the sphere as:



Note that (3.4) is the boundary value condition, which is a single point but enough to make the solution unique. We also alter the structure of neural networks employed in this subsection. As is shown in Figure 8, we put a lifting layer right after the input layer, which lifts to via

Figure 8: Illustration of PD-PINN with lifting.

To construct the dictionary, we employ real spherical harmonic basis functions[15] as the word functions,

where are the associated Legendre polynomial functions and are normalization constants. Set , and (3.4) is taken into account in each iteration. The results are shown in Figure 9. The PINN fails to recover in iterations while the PD-PINN recovers the ground truth with the error below 0.001.

Figure 9: The first row is the result that is produced by MLP with 4 hidden layers. The second row is the result that is produced by the PD-PINN.

3.4 Diffusion Equation

The last simulation is conducted on a parabolic equation. Define the ground truth

which is illustrated in Figure 10:

Figure 10: Illustration of

Consider the one-dimensional diffusion equation:


Though the input is two-dimensional, we could employ a dictionary only depends on one of the dimensions:

We employ with words involved. Take inside. Note that we regard the initial value condition (3.5) as a boundary value conditions and take . As is shown in Figure 11, the PD-PINN outperforms the PINN. As we have emphasized earlier in the manuscript, the loss curve drawn in the last subfigure suggests that the rapid vanishment of and do not necessarily imply an equivalent decline of the prediction error.

Figure 11: The first row is the result produced by an MLP with 4 hidden layers. The second row is produced by the PD-PINN with .

4 Conclusion

In this manuscript, we have proposed a novel PINN structure, which combines PINNs with prior dictionaries. With proper adoption of word functions, we illustrated that PD-PINNs outperform PINNs in our simulations with various settings. We also noted that the convergence of PINNs lacks a theoretical guarantee and thus proposed an error bound on the elliptic PDEs of second order. To our knowledge, this is the first theoretical error analysis on PINNs.

However, to make PINNs be more practical and universal PDE solvers, we still need to understand the way in which PINNs learn about physics information. Error bounds on other types of PDEs besides elliptic PDEs should also be established.


We thank Dr. Wenjie Lu and Dr. Dong Cao for their insightful suggestions. This work was supported in part by National Natural Science Foundation of China under Grant No.51675525 and 11725211.


  • [1] Michael W Berry, Murray Browne, Amy N Langville, V Paul Pauca, and Robert J Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational statistics & data analysis, 52(1):155–173, 2007.
  • [2] Chris HQ Ding and Inna Dubchak.

    Multi-class protein fold recognition using support vector machines and neural networks.

    Bioinformatics, 17(4):349–358, 2001.
  • [3] Ernie Esser, Michael Moller, Stanley Osher, Guillermo Sapiro, and Jack Xin. A convex model for nonnegative matrix factorization and dimensionality reduction on physical space. IEEE Transactions on Image Processing, 21(7):3239–3252, 2012.
  • [4] David Gilbarg and Neil S Trudinger. Elliptic partial differential equations of second order. springer, 2015.
  • [5] David Gottlieb and Steven A Orszag. Numerical analysis of spectral methods: theory and applications, volume 26. Siam, 1977.
  • [6] Peter Henrici. Fast fourier methods in computational complex analysis. Siam Review, 21(4):481–527, 1979.
  • [7] Ameya D Jagtap, Kenji Kawaguchi, and George Em Karniadakis. Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks. arXiv preprint arXiv:1909.12228, 2019.
  • [8] Ameya D. Jagtap, Kenji Kawaguchi, and George Em Karniadakis. Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 2020.
  • [9] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  • [10] Xing Lin, Yair Rivenson, Nezih T Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan.

    All-optical machine learning using diffractive deep neural networks.

    Science, 361(6406):1004–1008, 2018.
  • [11] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 807:155–166, 2016.
  • [12] Zichao Long, Yiping Lu, and Bin Dong. Pde-net 2.0: Learning pdes from data with a numeric-symbolic hybrid deep network. Journal of Computational Physics, 399:108925, 2019.
  • [13] Lu Lu, Xuhui Meng, Zhiping Mao, and George E Karniadakis. Deepxde: A deep learning library for solving differential equations. arXiv preprint arXiv:1907.04502, 2019.
  • [14] Tao Luo, Zheng Ma, Zhi-Qin John Xu, and Yaoyu Zhang. Theory of the frequency principle for general deep neural networks. arXiv preprint arXiv:1906.09235, 2019.
  • [15] Stefan Maintz, Marc Esser, and Richard Dronskowski. Efficient rotation of local basis functions using real spherical harmonics. Acta Physica Polonica B, 47(4), 2016.
  • [16] Julien Mairal, Francis Bach, and Jean Ponce. Task-driven dictionary learning. IEEE transactions on pattern analysis and machine intelligence, 34(4):791–804, 2011.
  • [17] Mohammad Reza Mohammadi, Emad Fatemizadeh, and Mohammad H Mahoor. Pca-based dictionary building for accurate facial expression recognition via sparse representation. Journal of Visual Communication and Image Representation, 25(5):1082–1092, 2014.
  • [18] José Danilo Rairán Antolines. Reconstruction of periodic signals using neural networks. Tecnura, 18(39):34–46, 2014.
  • [19] Maziar Raissi, Paris Perdikaris, and George E Karniadakis.

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.

    Journal of Computational Physics, 378:686–707, 2019.
  • [20] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017.
  • [21] Wieslaw Sienko, Wieslaw M Citko, and Bogdan M Wilamowski. Hamiltonian neural nets as a universal signal processor. In IEEE 2002 28th Annual Conference of the Industrial Electronics Society. IECON 02, volume 4, pages 3201–3204. IEEE, 2002.
  • [22] Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation. In Advances in Neural Information Processing Systems, pages 3835–3844, 2018.
  • [23] DH Wang and WH Liao. Modeling and control of magnetorheological fluid dampers using neural networks. Smart materials and structures, 14(1):111, 2004.
  • [24] Zhi-Qin John Xu, Yaoyu Zhang, and Yanyang Xiao. Training behavior of deep neural network in frequency domain. In International Conference on Neural Information Processing, pages 264–274. Springer, 2019.
  • [25] Samuel YK Yee. Solution of poisson’s equation on a sphere by truncated double fourier series. Monthly Weather Review, 109(3):501–505, 1981.
  • [26] Jun Zhang, Gilbert G Walter, Yubo Miao, and Wan Ngai Wayne Lee. Wavelet neural networks for function learning. IEEE transactions on Signal Processing, 43(6):1485–1497, 1995.