Learning To Solve Differential Equations Across Initial Conditions

03/26/2020 ∙ by Shehryar Malik, et al. ∙ Information Technology University Georgia State University 5

Recently, there has been a lot of interest in using neural networks for solving partial differential equations. A number of neural network-based partial differential equation solvers have been formulated which provide performances equivalent, and in some cases even superior, to classical solvers. However, these neural solvers, in general, need to be retrained each time the initial conditions or the domain of the partial differential equation changes. In this work, we posit the problem of approximating the solution of a fixed partial differential equation for any arbitrary initial conditions as learning a conditional probability distribution. We demonstrate the utility of our method on Burger's Equation.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Partial differential equations (PDEs) are of great importance in various fields such as science, engineering and economics. However, despite this, it is generally not possible to obtain analytic solutions for them. Instead, one has to resort to numerical schemes for approximating these solutions. However, these numerical schemes are both slow and computationally intensive especially for higher dimensions. Furthermore, the higher the dimension is, the greater is the error in the calculation of the derivatives required to approximate the solution.

Recently, Raissi et al. (2019) proposed to use neural networks to approximate solutions of PDEs. They do so by forcing neural networks to produce outputs that satisfy the PDE. The derivatives required to enforce this condition are computed using automatic differentiation and are exact up to the precision of the computing machine. Furthermore, this approach is mesh free i.e. one does not need to discretize the domain of the solution as required in methods such as finite element analysis.

In parallel, we have seen tremendous improvements in the generalization capacities of machine learning methods such as generative adversarial networks and flow-based models. These models have demonstrated a remarkable ability to capture probability distributions even when only representative samples are available [

Kingma and Dhariwal (2018), Brock et al. (2019), Oord et al. (2016)].

In this work, we generalize the framework of Yang and Perdikaris (2018) over the distribution of initial conditions and demonstrate that even if the model is trained on a subset of them, it is able to generalize well and produce solutions for any arbitrary initial conditions. Furthermore, our model also provides uncertainty quantifications which loosely correlate with the error in the solution.

This paper is structured as follows: in Section 2, we provide an overview of related works on partial differential equations. Section 3 presents our method and Section 4 talks about our experiments and results. Finally, Section 5 concludes the paper.

2 Related Works

PDEs have been traditionally solved through numerical methods such as finite differences, finite element methods and spectral methods (see for e.g. Tadmor (2012)). Lagaris et al. (1998)

made the first attempt at using neural networks to approximate solutions of PDEs. However, the idea did not catch up. The recent renaissance of neural networks and its remarkable successes in computer vision and natural language processing [

Krizhevsky et al. (2012), Devlin et al. (2018)] have sparked a new interest in scientific machine learning i.e. in applying data driven methods to problems in natural sciences [Han et al. (2018), Senior et al. (2020), Iqbal et al. (2019)]. Since a huge number of problems in the natural and physical sciences are described by PDEs, this has inevitably aroused interest in using machine learning to solve PDEs. In this regard, the physics-informed neural network method in Raissi et al. (2019)

was a critical development. It made two major contributions: (a) it showed that the automatic differentiation machinery of machine learning frameworks such as TensorFlow allowed the computation of partial derivatives to a higher order of accuracy than finite differences, and (b) it empirically showed that neural networks can be used to approximate solutions of PDEs by simply forcing them to produce outputs that satisfy the PDE. This framework has since been extended to fractional and stochastic PDEs [

Pang et al. (2018), Zhang et al. (2019)].

Neural network based solvers do not, in general, provide any convergence guarantees. Hence, it is important to have an uncertainty estimate for the solution produced by these methods.

Yang and Perdikaris (2018) and Yang and Perdikaris (2019) formulate the method in Raissi et al. (2019) in the framework of generative adversarial networks [Goodfellow et al. (2014)] and show that it is possible to obtain uncertainty estimates as well.

Concurrent to this has been considerable work on combining various classical solvers with machine learning methods to obtain more efficient solvers such as combining the Runge Kutta method with convolutional networks in Zhu et al. (2018)

and the Galarkin method with deep learning in

Sirignano and Spiliopoulos (2018).

However, one area which has attracted considerably less attention is attempting to learn a ‘general’ PDE solver through neural networks. General, in this context, refers to a PDE solver which does not need to be retrained if one or more constraints such as the initial or boundary condition or the domain are changed. Rather, the general PDE solver relies on the generalization capacity of neural networks to approximate the solution of a PDE across all possible sets of constraints. In this regard, Hsieh et al. (2019) and Farimani et al. (2017) have, for example, demonstrated that it is possible to learn a general PDE solver for some simple linear and elliptic PDEs. However, developing a single general PDE solver for all types of PDEs (parabolic, elliptic and hyperbolic) still remains an open problem.

3 Methodology

We consider partial differential equations (PDEs) of the form


where , , is the solution of the PDE and is a known function. Since the solution of this PDE depends on the initial conditions , we make this dependence explicit by writing . We denote the distribution of initial conditions with .

We propose to train a single Generative Adversarial Network (GAN) [Goodfellow et al. (2014)] for solving a PDE of the form of Equation 1 for any initial conditions drawn from the distribution . Note that the PDE is fixed (i.e. is fixed) and only the initial conditions are varied. The generator

takes in as input a random noise vector

, , and a particular instance of the initial conditions and generates an approximation of . The discriminator takes two tuples and and is asked to identify the tuple generated by the generator (as in a typical GAN setup).

The generator is composed of three networks: the encoder, approximator and reconstructor. The encoder takes in as input a particular instance of the initial conditions and encodes it into some latent vector . The approximator then takes in as input the latent vector along with some spatio-temporal coordinates and outputs an approximation of . Finally, the reconstructor takes in as input the outputs of the approximator for several different spatio-temporal coordinates but for the same initial conditions and tries to reconstruct the latent representation of the initial conditions from these approximations. The reconstructor (inspired, in part, by the InfoGAN architecture [Chen et al. (2016)]) forces the approximator to condition all of its outputs on the initial conditions, since otherwise the approximator may learn to produce samples from just one field irrespective of the initial conditions provided to it.

In addition to minimizing the typical GAN loss [Goodfellow et al. (2014)] given by


we force the samples produced by the generator to satisfy Equation 1 by minimizing the residual error


As noted before, the derivatives can readily be obtained via automatic differentiation.

Furthermore, we also force the true and approximated initial conditions to be equal by minimizing


Note that is fixed here.

Lastly, we enforce the boundary conditions. This can vary from equation to equation. For Burger’s equation we enforce the following


where and denote the upper and lower boundary values of respectively.

Our overall objective function is therefore given by


where , and are scalars.

Figure 1: Results on some test fields.

4 Experiments

In this section, we demonstrate our model on the Burger’s equation which is given by


for initial conditions of the form where , , and are real numbers.

We generate solutions for 120 different initial conditions by choosing different combinations of , , and from the sets , , and respectively. of these solutions constitute the test set (which is kept fixed for all experiments). The rest of the fields are randomly split for each experiment into training and validation sets in the ratio 85:15. The data is generated via the Chebfun package for MATLAB [Driscoll et al. (2014)] for solving PDEs. We discretize into points and such that the total number of points are around .

We concatenate at each of the values for into a vector. These are our initial conditions. We feed these initial conditions into our model as described in the previous section. Table 1 describes the model architecture. The reconstructor is fed the approximator’s outputs at different spatio-temporal coordinates. We trained the model parameters using the Adam optimizer with a learning rate of for million iterations. At each iteration, we feed the model initial points to enforce , boundary points to enforce and colocation points to enforce and .

We achieved a relative error of on the test set. Figure 1 shows some of the fields generated by the model for initial conditions in the test set. Since the latent variable

in the generator is a random variable, each time the model is run, it generates a slightly different field. As such, we also plot the variance in the model. Note that the variance loosely correlates with the residual (error) field and that both of these are high in regions close to discontinuities. This means that even though our model does not entirely approximate discontinuous regions correctly, it is nevertheless successful in identifying these areas.

While these results are preliminary, they do show the effectiveness of our method as a general PDE solver.

Layers Neurons Activation Output Neurons
Encoder ReLU
Approximator Tanh
Reconstructor Tanh
Discriminator ReLU
Table 1: Model architecture

5 Conclusion

We have demonstrated that a generative model can learn to predict solutions for unseen initial conditions when trained only a subset of them for a given PDE. This is the first step towards learning a general neural PDE solver which can provide reliable solutions for novel initial conditions. Furthermore, we have shown that as a result of the probabilistic nature of our model, uncertainty estimates also become freely available and, in our observation, correspond loosely with the error in the approximated solution of the PDE.


  • A. Brock, J. Donahue, and K. Simonyan (2019) Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, Cited by: §1.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172–2180. Cited by: §3.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §2.
  • T. A. Driscoll, N. Hale, and L. N. Trefethen (2014) Chebfun guide. Pafnuty Publications. External Links: Link Cited by: §4.
  • A. B. Farimani, J. Gomes, and V. S. Pande (2017) Deep Learning the Physics of Transport Phenomena. 94305. Cited by: §2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2, §3, §3.
  • J. Han, A. Jentzen, and W. E (2018) Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. External Links: Document, ISSN 0027-8424 Cited by: §2.
  • J. Hsieh, S. Zhao, S. Eismann, L. Mirabella, and S. Ermon (2019) Learning neural PDE solvers with convergence guarantees. In International Conference on Learning Representations, Cited by: §2.
  • A. Iqbal, R. Khan, and T. Karayannis (2019) Developing a brain atlas through deep learning. Nature Machine Intelligence 1 (6), pp. 277–287. Cited by: §2.
  • D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10215–10224. Cited by: §1.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §2.
  • I. E. Lagaris, A. Likas, and D. I. Fotiadis (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks 9 (5), pp. 987–1000. Cited by: §2.
  • A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu (2016) WaveNet: a generative model for raw audio. External Links: Link Cited by: §1.
  • G. Pang, L. Lu, and G. E. Karniadakis (2018) fPINNs: Fractional Physics-Informed Neural Networks. pp. 1–29. External Links: Link Cited by: §2.
  • M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. External Links: Document, ISSN 10902716 Cited by: §1, §2, §2.
  • A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver, K. Kavukcuoglu, and D. Hassabis (2020) Improved protein structure prediction using potentials from deep learning. Nature 577 (7792), pp. 706–710. External Links: ISSN 0028-0836, Document Cited by: §2.
  • J. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339–1364. Cited by: §2.
  • E. Tadmor (2012) A review of numerical methods for nonlinear partial differential equations. Bulletin of the American Mathematical Society 49 (4), pp. 507–554. Cited by: §2.
  • Y. Yang and P. Perdikaris (2018) Physics-informed deep generative models. arXiv preprint arXiv:1812.03511. Cited by: §1, §2.
  • Y. Yang and P. Perdikaris (2019) Adversarial uncertainty quantification in physics-informed neural networks. Journal of Computational Physics 394, pp. 136–152. External Links: Document, ISSN 10902716 Cited by: §2.
  • D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis (2019) Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. Journal of Computational Physics 397, pp. 108850. Cited by: §2.
  • M. Zhu, B. Chang, and C. Fu (2018) Convolutional neural networks combined with runge-kutta methods. arXiv preprint arXiv:1802.08831. Cited by: §2.