Partial differential equations (PDEs) are of great importance in various fields such as science, engineering and economics. However, despite this, it is generally not possible to obtain analytic solutions for them. Instead, one has to resort to numerical schemes for approximating these solutions. However, these numerical schemes are both slow and computationally intensive especially for higher dimensions. Furthermore, the higher the dimension is, the greater is the error in the calculation of the derivatives required to approximate the solution.
Recently, Raissi et al. (2019) proposed to use neural networks to approximate solutions of PDEs. They do so by forcing neural networks to produce outputs that satisfy the PDE. The derivatives required to enforce this condition are computed using automatic differentiation and are exact up to the precision of the computing machine. Furthermore, this approach is mesh free i.e. one does not need to discretize the domain of the solution as required in methods such as finite element analysis.
In parallel, we have seen tremendous improvements in the generalization capacities of machine learning methods such as generative adversarial networks and flow-based models. These models have demonstrated a remarkable ability to capture probability distributions even when only representative samples are available [Kingma and Dhariwal (2018), Brock et al. (2019), Oord et al. (2016)].
In this work, we generalize the framework of Yang and Perdikaris (2018) over the distribution of initial conditions and demonstrate that even if the model is trained on a subset of them, it is able to generalize well and produce solutions for any arbitrary initial conditions. Furthermore, our model also provides uncertainty quantifications which loosely correlate with the error in the solution.
2 Related Works
made the first attempt at using neural networks to approximate solutions of PDEs. However, the idea did not catch up. The recent renaissance of neural networks and its remarkable successes in computer vision and natural language processing [Krizhevsky et al. (2012), Devlin et al. (2018)] have sparked a new interest in scientific machine learning i.e. in applying data driven methods to problems in natural sciences [Han et al. (2018), Senior et al. (2020), Iqbal et al. (2019)]. Since a huge number of problems in the natural and physical sciences are described by PDEs, this has inevitably aroused interest in using machine learning to solve PDEs. In this regard, the physics-informed neural network method in Raissi et al. (2019)
was a critical development. It made two major contributions: (a) it showed that the automatic differentiation machinery of machine learning frameworks such as TensorFlow allowed the computation of partial derivatives to a higher order of accuracy than finite differences, and (b) it empirically showed that neural networks can be used to approximate solutions of PDEs by simply forcing them to produce outputs that satisfy the PDE. This framework has since been extended to fractional and stochastic PDEs [Pang et al. (2018), Zhang et al. (2019)].
Neural network based solvers do not, in general, provide any convergence guarantees. Hence, it is important to have an uncertainty estimate for the solution produced by these methods.Yang and Perdikaris (2018) and Yang and Perdikaris (2019) formulate the method in Raissi et al. (2019) in the framework of generative adversarial networks [Goodfellow et al. (2014)] and show that it is possible to obtain uncertainty estimates as well.
Concurrent to this has been considerable work on combining various classical solvers with machine learning methods to obtain more efficient solvers such as combining the Runge Kutta method with convolutional networks in Zhu et al. (2018)
and the Galarkin method with deep learning inSirignano and Spiliopoulos (2018).
However, one area which has attracted considerably less attention is attempting to learn a ‘general’ PDE solver through neural networks. General, in this context, refers to a PDE solver which does not need to be retrained if one or more constraints such as the initial or boundary condition or the domain are changed. Rather, the general PDE solver relies on the generalization capacity of neural networks to approximate the solution of a PDE across all possible sets of constraints. In this regard, Hsieh et al. (2019) and Farimani et al. (2017) have, for example, demonstrated that it is possible to learn a general PDE solver for some simple linear and elliptic PDEs. However, developing a single general PDE solver for all types of PDEs (parabolic, elliptic and hyperbolic) still remains an open problem.
We consider partial differential equations (PDEs) of the form
where , , is the solution of the PDE and is a known function. Since the solution of this PDE depends on the initial conditions , we make this dependence explicit by writing . We denote the distribution of initial conditions with .
We propose to train a single Generative Adversarial Network (GAN) [Goodfellow et al. (2014)] for solving a PDE of the form of Equation 1 for any initial conditions drawn from the distribution . Note that the PDE is fixed (i.e. is fixed) and only the initial conditions are varied. The generator
takes in as input a random noise vector, , and a particular instance of the initial conditions and generates an approximation of . The discriminator takes two tuples and and is asked to identify the tuple generated by the generator (as in a typical GAN setup).
The generator is composed of three networks: the encoder, approximator and reconstructor. The encoder takes in as input a particular instance of the initial conditions and encodes it into some latent vector . The approximator then takes in as input the latent vector along with some spatio-temporal coordinates and outputs an approximation of . Finally, the reconstructor takes in as input the outputs of the approximator for several different spatio-temporal coordinates but for the same initial conditions and tries to reconstruct the latent representation of the initial conditions from these approximations. The reconstructor (inspired, in part, by the InfoGAN architecture [Chen et al. (2016)]) forces the approximator to condition all of its outputs on the initial conditions, since otherwise the approximator may learn to produce samples from just one field irrespective of the initial conditions provided to it.
In addition to minimizing the typical GAN loss [Goodfellow et al. (2014)] given by
we force the samples produced by the generator to satisfy Equation 1 by minimizing the residual error
As noted before, the derivatives can readily be obtained via automatic differentiation.
Furthermore, we also force the true and approximated initial conditions to be equal by minimizing
Note that is fixed here.
Lastly, we enforce the boundary conditions. This can vary from equation to equation. For Burger’s equation we enforce the following
where and denote the upper and lower boundary values of respectively.
Our overall objective function is therefore given by
where , and are scalars.
In this section, we demonstrate our model on the Burger’s equation which is given by
for initial conditions of the form where , , and are real numbers.
We generate solutions for 120 different initial conditions by choosing different combinations of , , and from the sets , , and respectively. of these solutions constitute the test set (which is kept fixed for all experiments). The rest of the fields are randomly split for each experiment into training and validation sets in the ratio 85:15. The data is generated via the Chebfun package for MATLAB [Driscoll et al. (2014)] for solving PDEs. We discretize into points and such that the total number of points are around .
We concatenate at each of the values for into a vector. These are our initial conditions. We feed these initial conditions into our model as described in the previous section. Table 1 describes the model architecture. The reconstructor is fed the approximator’s outputs at different spatio-temporal coordinates. We trained the model parameters using the Adam optimizer with a learning rate of for million iterations. At each iteration, we feed the model initial points to enforce , boundary points to enforce and colocation points to enforce and .
We achieved a relative error of on the test set. Figure 1 shows some of the fields generated by the model for initial conditions in the test set. Since the latent variable
in the generator is a random variable, each time the model is run, it generates a slightly different field. As such, we also plot the variance in the model. Note that the variance loosely correlates with the residual (error) field and that both of these are high in regions close to discontinuities. This means that even though our model does not entirely approximate discontinuous regions correctly, it is nevertheless successful in identifying these areas.
While these results are preliminary, they do show the effectiveness of our method as a general PDE solver.
We have demonstrated that a generative model can learn to predict solutions for unseen initial conditions when trained only a subset of them for a given PDE. This is the first step towards learning a general neural PDE solver which can provide reliable solutions for novel initial conditions. Furthermore, we have shown that as a result of the probabilistic nature of our model, uncertainty estimates also become freely available and, in our observation, correspond loosely with the error in the approximated solution of the PDE.
- Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, Cited by: §1.
- Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172–2180. Cited by: §3.
- BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §2.
- Chebfun guide. Pafnuty Publications. External Links: Cited by: §4.
- Deep Learning the Physics of Transport Phenomena. 94305. Cited by: §2.
- Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2, §3, §3.
- Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. External Links: Cited by: §2.
- Learning neural PDE solvers with convergence guarantees. In International Conference on Learning Representations, Cited by: §2.
- Developing a brain atlas through deep learning. Nature Machine Intelligence 1 (6), pp. 277–287. Cited by: §2.
- Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10215–10224. Cited by: §1.
- ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §2.
- Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks 9 (5), pp. 987–1000. Cited by: §2.
- WaveNet: a generative model for raw audio. External Links: Cited by: §1.
- fPINNs: Fractional Physics-Informed Neural Networks. pp. 1–29. External Links: Cited by: §2.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. External Links: Cited by: §1, §2, §2.
- Improved protein structure prediction using potentials from deep learning. Nature 577 (7792), pp. 706–710. External Links: Cited by: §2.
- DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339–1364. Cited by: §2.
- A review of numerical methods for nonlinear partial differential equations. Bulletin of the American Mathematical Society 49 (4), pp. 507–554. Cited by: §2.
- Physics-informed deep generative models. arXiv preprint arXiv:1812.03511. Cited by: §1, §2.
- Adversarial uncertainty quantification in physics-informed neural networks. Journal of Computational Physics 394, pp. 136–152. External Links: Cited by: §2.
- Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. Journal of Computational Physics 397, pp. 108850. Cited by: §2.
- Convolutional neural networks combined with runge-kutta methods. arXiv preprint arXiv:1802.08831. Cited by: §2.