Physics-Informed Neural Operator for Learning Partial Differential Equations

11/06/2021 ∙ by Zongyi Li, et al. ∙ 172

Machine learning methods have recently shown promise in solving partial differential equations (PDEs). They can be classified into two broad categories: approximating the solution function and learning the solution operator. The Physics-Informed Neural Network (PINN) is an example of the former while the Fourier neural operator (FNO) is an example of the latter. Both these approaches have shortcomings. The optimization in PINN is challenging and prone to failure, especially on multi-scale dynamic systems. FNO does not suffer from this optimization issue since it carries out supervised learning on a given dataset, but obtaining such data may be too expensive or infeasible. In this work, we propose the physics-informed neural operator (PINO), where we combine the operating-learning and function-optimization frameworks. This integrated approach improves convergence rates and accuracy over both PINN and FNO models. In the operator-learning phase, PINO learns the solution operator over multiple instances of the parametric PDE family. In the test-time optimization phase, PINO optimizes the pre-trained operator ansatz for the querying instance of the PDE. Experiments show PINO outperforms previous ML methods on many popular PDE families while retaining the extraordinary speed-up of FNO compared to solvers. In particular, PINO accurately solves challenging long temporal transient flows and Kolmogorov flows where other baseline ML methods fail to converge.






page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning-based methods are starting to show promise in scientific computing and especially in solving partial differential equations (PDEs). They have demonstrated advantages in both efficiency and accuracy compared to conventional solvers. They are even able to tackle previously intractable problems such as higher-dimensional, multi-scale, high-contrast, and chaotic PDE systems [33, 3, 8, 22, 12, 2]. Broadly, ML-based approaches for PDEs can be divided into two categories: optimizing to solve for a specific solution function of PDE vs. learning the solution operator over a family of PDEs.

Optimization of solution function and PINN.

Most ML-based methods, as well as the conventional solvers, fall into this category. Conventional solvers such as FDM and FEM usually discretize the domain into a grid and optimize/approximate the solution function on the grid, which imposes a truncation error. The Physics-Informed Neural Network (PINN)-type methods are proposed to overcome the discretization issue [28]. They use a neural network as the ansatz of the solution function and take advantage of auto-differentiation to compute the exact, mesh-free derivatives. Recently, researchers have developed numerous variations of PINN with promising results on inverse problems and partially observed tasks [23, 40, 31]. However, compared to conventional solvers, PINNs face several optimization issues: (1) the challenging optimization landscape from soft physics or PDE constraints [35], (2) the difficulty to propagate information from the initial or boundary conditions to unseen parts of the interior or to future times [7], and (3) the sensitivity to hyper-parameters selection [32]. As a result, PINNs are still unable to compete with conventional solvers in most cases, and they often fail to converge on high-frequency or multi-scale PDEs [37, 9, 29]. In this work, we propose to overcome these optimization challenges by integrating operator learning with PINN.

Operator learning and neural operators.

A recent alternative approach is to learn the solution operator of a family of PDEs, defined by the map from the input–initial conditions and boundary conditions, to the output–solution functions. In this case, usually, a dataset of input-output pairs from an existing solver is given. There are two main aspects to consider (a) model: to design models for learning highly complicated PDE solution operators, and (b) data: to be data-efficient and to improve generalization. Recent advances in operator learning replace traditional convolutional neural networks and U-Nets from computer vision with operator-based models tailored to PDEs with greatly improved model expressiveness

[20, 24, 26, 34, 6]. Specifically, the neural operator generalizes the neural network to the operator setting where the input and output spaces are infinite-dimensional. The framework has shown success in learning resolution-invariant solution operators for highly non-linear problems such as turbulence flow [19, 18]. However, the data challenges remain: (1) the need for training data, which assumes an existing solver or experimental setup, (2) the non-negligible generalization error, and (3) extrapolation to unseen conditions. These issues can be addressed by adding physics or PDE constraints to operator learning [40, 36, 39].

Figure 1: PINO: combine operator learning and PINN optimization.

(a) learn the solution operator from a family of equations. (b) use the learned operator as an ansatz to solve for a specific instance.

Our contributions.

To overcome the shortcomings of both physics-informed optimization and data-driven operator learning, we propose the physics-informed neural operator (PINO). It requires fewer or no data points to learn the operator and generalizes better. In PINO, we use the pre-trained operator as the ansatz to optimize for the solution function at test time, which reduces the generalization error. Compared to PINN, PINO has a much better optimization landscape and representation space, and hence, PINO converges faster and more accurately. Our contributions can be summarized as follows:

  • We propose the physics-informed neural operator (PINO), combining the operator-learning and physics-informed settings. We introduce the pre-training and test-time optimization schemes that utilize both the data and equation constraints (whichever are available). We develop an efficient method to compute the exact gradient for neural operators to incorporate the equation constraints.

  • By utilizing pre-trained operator ansatz, PINO overcomes the challenge of propagating information from the initial condition to future time steps with (soft) physics constraints. It can solve the 2d transient flow over an extremely long time period, where PINN and DeepONet [24] fail to converge. Even without any pre-training and using only PDE constraints for the given instance, PINO still outperforms PINN by 20x smaller error and 25x speedup on the chaotic Kolmogorov flow, demonstrating superior expressivity of the neural operators over standard neural networks.

  • By utilizing the equation constraints, PINO requires fewer or no training data and generalizes better compared to FNO [20]. On average it has 7% smaller error on the transient and Kolmogorov flows, while matching the speedup of FNO (400x) compared to the GPU-based pseudo-spectral solver [13], matching FNO. Further, the pre-trained PINO model on the Navier Stokes equation can be easily transferred to different Reynolds numbers ranging from to using test-time optimization.

  • We propose the forward and backward PINO models for inverse problems. Our approach accurately recovers the coefficient function in the Darcy flow which is 3000x faster than the conventional solvers using accelerated MCMC [5].

PINN vs. PINO: pointwise vs. function-wise optimization.

The neural operator ansatz in PINO has an easier optimization landscape and a more expressive representation space compared to the neural networks ansatz in PINN. The neural operator parameterizes the solution function as an aggregation of basis functions, and hence, the optimization is in the function space. This is easier than just optimizing a single function as in PINN. Further, we can learn these basis functions in the pre-training phase which makes the test-time optimization on the querying instance even easier. In this case, PINO does not need to solve from scratch. It just fine-tunes the solution function parameterized by the solution operator. Thus, PINO is much faster and more accurate compared to PINN.

2 Preliminaries and problem settings

In this section, we first define the stationary and dynamic PDE systems that we consider. We give an overview of the physics-informed setting and operator-learning setting. In the end, we define the Fourier neural operator as a specific model for operator learning.

2.1 Problem settings

We consider two natural class of PDEs. In the first, we consider the stationary system


where is a bounded domain, is a PDE coefficient/parameter, is the unknown, and is a possibly non-linear partial differential operator with a triplet of Banach spaces. Usually the function is a fixed boundary condition (potentially can be entered as a parameter). This formulation gives rise to the solution operator defined by . A prototypical example is the second-order elliptic equation .

In the second setting, we consider the dynamical system


where is the initial condition, for is the unknown, and is a possibly non-linear partial differential operator with , and Banach spaces. As before, we take to be a known boundary condition. We assume that exists and is bounded for all time and for every . This formulation gives rise to the solution operator defined by . Prototypical examples include the Burgers’ equation and the Navier-Stokes equation.

2.2 Solving equation using the physics-informed loss (PINN)

Given an instance and a solution operator defined by equations (1) or (2) , we denote by the unique ground truth. The equation solving task is to approximate . This setting consists of the ML-enhanced conventional solvers such as learned finite element, finite difference, and multigrid solvers [16, 27, 11], as well as purely neural network-based solvers such as the Physics-Informed Neural Networks (PINNs), Deep Galerkin Method, and Deep Ritz Method [28, 30, 38]. Especially, these PINN-type methods use a neural network with parameters as the the ansatz to approximate the solution function . The parameters are found by minimizing the physics-informed loss with exact derivatives computed using automatic-differentiation (autograd). In the stationary case, the physics-informed loss is defined by minimizing the l.h.s. of equation (1) in the squared norm of . A typical choice is

, giving the loss function


In the case of a dynamical system, it minimizes the residual of equation (2) in some natural norm up to a fixed final time . A typical choice is the norm, yielding


The PDE loss consists of the physics loss in the interior and the data loss on the boundary and initial conditions, with hyper-parameters . It can be generalized to variational form as in [38].

Challenges of PINN.

PINNs take advantage of the universal approximability of neural networks, but, in return, suffer from the low-frequency induced bias. Empirically, PINNs often fail to solve challenging PDEs when the solution exhibits high-frequency or multi-scale structure [35, 37, 9, 29]. Further, as an iterative solver, PINNs have difficulty propagating information from the initial condition or boundary condition to unseen parts of the interior or to future times [7]. For example, in challenging problems such as turbulence, PINNs are only able to solve the PDE on a relatively small domain [14], or otherwise, require extra observational data which is not always available in practice [29, 4]. In this work, we propose to overcome the challenges posed by the optimization by integrating operator learning with PINNs.

2.3 Learning the solution operator (neural operator)

An alternative setting is to learn the solution operator . Given a PDE as defined in (1) or (2) and the corresponding solution operator , one can use a neural operator with parameters as a surrogate model to approximate . Usually we assume a dataset is available, where and are i.i.d. samples from some distribution supported on . In this case, one can optimize the solution operator by minimizing the empirical data loss on a given data pair


where we assume the setting of (1) for simplicity of the exposition. The operator data loss is defined as the average error across all possible inputs


Similarly, one can define the operator PDE loss as


In general, it is non-trivial to compute the derivatives and for model . In the following section, we will discuss how to compute these derivatives for Fourier neural operator.

2.4 Fourier neural operators

In this work, we will focus on the neural operator model designed for the operator learning problem. The neural operator, proposed in [20], is formulated as an generalization of standard deep neural networks to operator setting. Neural operator compose linear integral operator

with pointwise non-linear activation function

to approximate highly non-linear operators.

Definition 1 (Neural operator )

Define the neural operator


where , are the pointwise neural networks that encode the lower dimension function into higher dimensional space and decode the higher dimension function back to the lower dimensional space. The model stack layers of where are pointwise linear operators (matrices), are integral kernel operators, and are fixed activation functions. The parameters consists of all the parameters in .

Recently, Li et al. [18] proposes the Fourier neural operator (FNO) that deploys convolution operator for

. In this case, it can apply the Fast Fourier Transform (FFT) to efficiently compute

. This leads to a fast architecture that obtains state-of-the-art results for PDE problems.

Definition 2 (Fourier convolution operator )

Define the Fourier convolution operator


where is part of the parameter to be learn.

Challenges of Operator learning.

Operator learning is similar to supervised learning in computer vision and language where data play a very important role. One needs to assume the training points and testing points follow the same problem setting and the same distribution. Especially, the previous FNO model trained on one coefficient (e.g. Reynolds number) or one geometry cannot be easily generalized to another. Moreover, for more challenging PDEs where the solver is very slow or the solver is even not existent, it is hard to gather a representative dataset. On the other hand, since FNO doesn’t use any knowledge of the equation, it is cannot get arbitrarily close to the ground truth by using the higher resolution as in conventional solvers, leaving a gap of generalization error. These challenges limit FNO’s applications beyond accelerating the solver. In the following section, we will introduce the PINO framework to overcome these problems by using the equation constraints.

3 Physics-informed neural operator (PINO)

We propose the PINO framework that uses one neural operator model for solving both operator learning problems and equation solving problems. It consists of two phases

  • Pre-train the solution operator: learn a neural operator to approximate using either/both the data loss and/or the PDE loss which can be additional to the dataset.

  • Test-time optimization: use as the ansatz to approximate with the pde loss and/or an additional operator loss obtained from the pre-train phase.

3.1 Pre-train: operator learning with PDE loss

Given a fixed amount of data , the data loss offers a stronger constraint compared to the PDE loss . However, the PDE loss does not require a pre-existing dataset. One can sample virtual PDE instances by drawing additional initial conditions or coefficient conditions for training, as shown in Algorithm 1. In this sense, we have access to the unlimited dataset by sampling new in each iteration.

Data: Data: input output function pairs
Result: Neural operator
1 for  do
2       Compute and on . Update neural operator ;
3       for  to  do
4            Sample from distribution ;
5             Compute on . Update neural operator ;
6       end for
8 end for
Algorithm 1 Pre-training scheme for Physics-informed neural operator learning

3.2 Test-time optimization: solving equation with operator ansatz

Given a learned operator , we use as the ansatz to solve for . The optimization procedure is similar to PINNs where it computes the PDE loss on , except that we propose to use a neural operator instead of a neural network. Since the PDE loss is a soft constraints and challenging to optimize, we also add an optional operator loss (anchor loss) to bound the further optimized model from the pre-trained model

where is the model at

training epoch. We update the operator

using the loss . It is possible to further apply optimization techniques to fine-tune the last fewer layers of the neural operator and progressive training that gradually increase the grid resolution and use finer resolution in test time.

Optimization landscape.

Using the operator as the ansatz has two major advantages: (1) PINN does point-wise optimization, while PINO does optimization in the space of functions. In the linear integral operation , the operator parameterizes the solution function as a sum of the basis function. Optimization of the set of coefficients and basis is easier than just optimizing a single function as in PINNs. (2) we can learn these basis functions in the operator learning phase which makes the later test-time optimization even easier. In PINO, we do not need to propagate the information from the initial condition and boundary condition to the interior. It just requires fine-tuning the solution function parameterized by the solution operator.


(1) complexity and accuracy: the test-time optimization is an option to spend more computation in exchange for better accuracy. When the speed is most desired, one should use the pre-trained model to directly output the prediction. However, if one wants to solve for a specific instance accurately, then the person can use test-time optimization to correct the generalization error posed in operator learning. (2) resolution effects on optimization landscape and truncation error: using a higher resolution and finer grid will reduce the truncation error. However, it may make the optimization unstable. Using hard constraints such as the anchor loss relieves such a problem.

3.3 Derivatives of neural operators

In order to use the equation loss , one of the major technical challenge is to efficiently compute the derivatives for neural operators. In this section, we discuss three efficient methods to compute the derivatives of neural operator as defined in 8.

Numerical differentiation.

A simple but efficient approach is to use conventional numerical gradients such as finite difference and Fourier gradient [40, 10]. These numerical gradient methods are fast and memory-efficient: given a -points grid, finite difference requires and Fourier method requires . However, they face the same challenges as the corresponding numerical solvers: finite difference methods require a fine-resolution uniform grid; spectral methods require smoothness and uniform grids. Especially. These numerical errors on the gradient will be amplified on the output solution.


Similar to PINN [28], the most general way to compute the exact gradient is to use the auto-differentiation library of neural networks (autograd). To apply autograd, one needs to use a neural network to parameterize the solution function . However, it is not straightforward to write out the solution function in the neural operator which directly outputs the numerical solution on a grid, especially for FNO which uses FFT. To apply autograd, we design a query function that input and output . Recall and . Since is pointwise,


To query any location , we need to evaluate the Fourier convolution and the bias part . For the first part, we can directly query the Fourier coefficient at spatial location without doing the inverse FFT

. For the second part, we can define the query function as an interpolation or a low-rank integral operator as in

[17]. Then we can use autograd to take the exact derivative for (10). The autograd method is general and exact, however, it is less efficient. Since the number of parameters is usually much greater than the grid size , the numerical methods are indeed significantly faster. Empirically, the autograd method is usually slower and memory-consuming.

Exact gradients.

We develop an efficient and exact gradient method based on the architecture of the neural operator. The idea is similar to the autograd, but we explicitly write out the gradient on the Fourier space and apply the chain rule. Given the explicit form (

10), can be directly compute on the Fourier space. The linear part can be interpolated using the Fourier method. Therefore to exactly compute the derivative of , one just needs to run the numerical Fourier gradient. Then we just need to apply chain rule for Jacobian and Hessian . Since is pointwise, this can be much simplified. In the experiments, we mostly use the exact gradient and the numerical Fourier gradient.

4 Experiments

In this section, we conduct empirical experiments to examine the efficacy of the proposed PINO. In 4.1, we show the physics constraints help operator-learning with fewer data and better generalization. Then in 4.2, we investigate how PINO uses operator ansatz to solve harder equations with improved speed and accuracy. We study three concrete cases of PDEs on Burgers’ Equation, Darcy Equation, and Navier-Stokes equation. The implementation details of PINO and baseline methods are listed in the Appendix A.

(a) The long-temporal transient flow with . PINO outputs the full trajectory in one step, which leads to 400x speedup compared to the GPU solver. PINN cannot converge to a reasonable error rate due to the long time window. (b) The chaotic Kolmogorov flow with . PINO converges faster compared to PINN, but their convergence rates with gradient descent are less effective compared to using higher resolutions in the GPU solver.

Figure 2: The accuracy-complexity trade-off on PINO, PINN, and the GPU-based pseudo-spectral solver.
Burgers’ Equation.

The 1-d Burgers’ equation is a non-linear PDE with periodic boundary conditions where is the initial condition and is the viscosity coefficient. We aim to learn the operator mapping the initial condition to the solution, .

Darcy Flow.

The 2-d steady-state Darcy Flow equation on the unit box which is the second order linear elliptic PDE with a Dirichlet boundary where is a piecewise constant diffusion coefficient and is a fixed forcing function. We are interested in learning the operator mapping the diffusion coefficient to the solution, . Note that although the PDE is linear, the operator is not.

Navier-Stokes Equation.

We consider the 2-d Navier-Stokes equation for a viscous, incompressible fluid in vorticity form on the unit torus, where for any is the velocity field, is the vorticity, is the initial vorticity, is the viscosity coefficient, and is the forcing function. We want to learn the operator mapping the vorticity from the initial condition to the full solution .


Specially, we consider two problem settings:

  • Long temporal transient flow: we study the build-up of the flow from the initial condition near-zero velocity to that reaches the ergodic state. We choose , , as in Li et al. [18]. The main challenge is to predict the long time interval.

  • Chaotic Kolmogorov flow: In this case lies in the attractor where arbitrary starting time . We choose or , , similar to Li et al. [21]. The main challenge is to capture the small details that evolve chaotically.

  • Lid cavity flow: In this case, we assume the no-slip boundary condition where at left, bottom, and right walls and on top, similar to Bruneau and Saad [1]. We choose , , . The main challenge is to address the boundary using the velocity-pressure formulation.

Figure 3: PINO on Kolmogorov flow (left) and Lid-cavity flow (right)

4.1 Operator learning with PDE losses

We first show the effectiveness of applying the physics constraints to learn the solution operator.

Burgers equation and Darcy equation.

PINO can learn the solution operator without any data on simpler problems such as Burgers and Darcy. Compared to other PDE-constrained operators, PINO is more expressive and thereby achieves better accuracy. On Burgers (11), PINN-DeepONet achieves 1.38% [36]; PINO achieves 0.37%. Similarly, on Darcy flow (12), PINO outperforms FNO by utilizing physics constraints, as shown in Table 3. For these simpler equations, test-time optimization may not be needed. The implementation detail and the search space of parameters are included in the Appendix A.1.1 and A.1.2.

4.2 Solve equation using operator ansatz

# data samples # additional PDE instances Solution error Equation error
0 100k 74.36% 0.3741
0.4k 0 33.32% 1.8779
0.4k 40k 31.74% 1.8179
0.4k 160k 31.32% 1.7840
4k 0 25.15% 1.8223
4k 100k 24.15% 1.6112
4k 400k 24.22% 1.4596

Table 1: Operator-learning on Kolmogorov flow . Each relative test error is averaged over 300 instances, which is evaluated with resolution . Complete results of other resolutions are reported in Table 4.
Long temporal transient flow.

It is extremely challenging to propagate the information from the initial condition to future time steps over such a long interval just using the soft physics constraint. Neither of the PINN and PINO (from scratch without pre-training) can handle this case (error ), no matter solving the full interval at once or solving per smaller steps. However, when the pre-training data is available for PINO, we can use the learned neural operator ansatz and the anchor loss . The anchor loss is a hard constraint that makes the optimization much easier. Providing training data, the PINO without test-time optimization achieves 2.87% error, lower than FNO 3.04% and it retains a 400x speedup compared to the GPU-based pseudo-spectral solver [13], matching FNO. Further doing test time optimization with the anchor loss and PDE loss, PINO reduces the error to 1.84%.

Chaotic Kolmogorov flow.

We show an empirical study on how PINO can improve the generalization of neural operators by enforcing more physics. For this experiment, the training set consists of 4000 data points of the initial condition and corresponding solution. Using Algorithm 1, we can sample unlimited additional initial conditions from a Gaussian random field at any resolution. Table 1 and 4 compare the generalization error of neural operators trained by different schemes and different amounts of simulated data. The result shows that training neural operator with additional PDE instances consistently improves the generalization error on all three resolutions we are evaluating.

Based on the solution operators learned in the above operator-learning section, we continue to do test-time optimization. The results are shown in Figure and Table 2. Overall, PINO outperforms PINN by 20x smaller error and 25x speedup. Using a pre-trained model makes PINO converge faster. The implementation detail and the search space of parameters are included in the Appendix A.2.1.

Method # data samples # additional PDE instances Solution error Time cost
PINNs - - 18.7% 4577s
PINO 0 0 0.9% 608s
PINO 4k 0 0.9% 536s
PINO 4k 160k 0.9% 473s
Table 2: Equation-solving on Kolmogorov flow .
Transfer Reynolds numbers.

The extrapolation of different parameters and conditions is one of the biggest challenges for ML-based methods. By doing test-time optimization, the pre-trained PINO model on the Kolmogorov flow can be easily transferred to different Reynolds numbers ranging from to in test-time optimization as shown in Table 5. Such property envisions broad applications.

Lid cavity flow.

We demonstrate an addition example using PINO to solve for lid-cavity flow on with . In this case, we do not have the pre-train phase and directly solve the equation (test-time optimization). We use PINO using the velocity-pressure formulation with resolution and Fourier numerical gradient. It takes minutes to achieve a relative error of . Figure 3 shows the ground truth and prediction of the velocity field at where the PINO accurately predicts the ground truth.

Figure 4: Darcy inverse problem

4.3 Inverse problem

One of the major advantages of the physics-informed method is to solve the inverse problem. In this case, we investigate PINO on the inverse problem of the Darcy equation to recover the coefficient function from the given solution function . We assume a dataset is available to pre-train the operator. We propose two formulations of PINO for the inverse problem:

  • Forward model: Learn the forward operator with data. Initialize to approximate . Optimize using

  • Backward model: Learn the backward operator with data. Use to approximate . Optimize using

Where is the regularization term. We use the PDE loss to deal with the small error in and the ill-defining issue of .

We perform a showcase example on the Darcy equation, where we are given the full-field observation of (clean data) to recover the coefficient function . The coefficient function is piecewise constant (representing two types of media), so the inverse problem can be viewed as a classification problem. We define

as the total variance. As shown in Figure

4, the PINO backward model has the best performance: it has 2.29% relative l2 error on the output and 97.10% classification accuracy on the input ; the forward model has 6.43% error on the output and 95.38% accuracy on the input. As a reference, we compare the PINO inverse frameworks with PINN and the conventional solvers using the accelerated MCMC method with 500,000 steps [5]

. The posterior mean of the MCMC has 4.52% error and 90.30% respectively ( Notice the Bayesian method outputs the posterior distribution, which is beyond obtaining a maximum a posteriori estimation) Meanwhile, PINO methods are 3000x faster compared to MCMC. PINN does not converge in this case. The major advantage of the PINO backward model compared to the PINO forward model is that it uses a neural operator

as the ansatz for the coefficient function. Similar to the forward problem, the operator ansatz has an easier optimization landscape while being expressive.

5 Conclusion and Future works

In this work, we develop the physics-informed neural operator (PINO) that bridges the gap between physics-informed optimization and data-driven neural operator learning. We introduce the pre-training and test-time optimization schemes for PINO to utilize both the data and physics constraints. In the pre-training phase, PINO learns an operator ansatz over multiple instances of a parametric PDE family. The test-time optimization scheme allows us to take advantage of the learned neural operator ansatz and solve for the solution function on the querying instance faster and more accurately.

There are many exciting future directions. Most of the techniques and analysis of PINN can be transferred to PINO. It is also interesting to ask how to overcome the hard trade-off of accuracy and complexity, and how the PINO model transfers across different geometries. Furthermore, we can develop a software library of pre-trained models. PINO’s excellent extrapolation property allows it to be applied on a broad set of conditions, as shown in the Transfer Reynold’s number experiments.


Z. Li gratefully acknowledges the financial support from the Kortschak Scholars Program. N. Kovachki is partially supported by the Amazon AI4Science Fellowship. A. Anandkumar is supported in part by Bren endowed chair, Microsoft, Google, Adobe faculty fellowships, and DE Logi grant. The authors want to thank Sifan Wang for meaningful discussions.


  • [1] C. Bruneau and M. Saad (2006) The 2d lid-driven cavity problem revisited. Computers & fluids 35 (3), pp. 326–348. Cited by: §A.2.1, 3rd item.
  • [2] O. P. Bruno, J. S. Hesthaven, and D. V. Leibovici (2021) FC-based shock-dynamics solver with neural-network localized artificial-viscosity assignment. arXiv preprint arXiv:2111.01315. Cited by: §1.
  • [3] S. L. Brunton, B. R. Noack, and P. Koumoutsakos (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52, pp. 477–508. Cited by: §1.
  • [4] S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis (2021) Physics-informed neural networks (pinns) for fluid mechanics: a review. arXiv preprint arXiv:2105.09506. Cited by: §2.2.
  • [5] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White (2013) MCMC methods for functions: modifying old algorithms to make them faster. Statistical Science 28 (3), pp. 424–446. Cited by: 4th item, §4.3.
  • [6] J. Duvall, K. Duraisamy, and S. Pan (2021) Non-linear independent dual system (nids) for discretization-independent surrogate modeling over complex geometries. External Links: 2109.07018 Cited by: §1.
  • [7] V. Dwivedi and B. Srinivasan (2020) Physics informed extreme learning machine (pielm)–a rapid method for the numerical solution of partial differential equations. Neurocomputing 391, pp. 96–118. Cited by: §1, §2.2.
  • [8] Y. Fan, L. Lin, L. Ying, and L. Zepeda-Núnez (2018) A multiscale neural network based on hierarchical matrices. arXiv preprint arXiv:1807.01883. Cited by: §1.
  • [9] O. Fuks and H. A. Tchelepi (2020) Limitations of physics informed machine learning for nonlinear two-phase transport in porous media. Journal of Machine Learning for Modeling and Computing 1 (1). Cited by: §1, §2.2.
  • [10] H. Gao, L. Sun, and J. Wang (2021) PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state pdes on irregular domain. Journal of Computational Physics 428, pp. 110079. Cited by: §3.3.
  • [11] D. Greenfeld, M. Galun, R. Basri, I. Yavneh, and R. Kimmel (2019) Learning to optimize multigrid pde solvers. In International Conference on Machine Learning, pp. 2415–2423. Cited by: §2.2.
  • [12] J. Han, A. Jentzen, and E. Weinan (2018)

    Solving high-dimensional partial differential equations using deep learning

    Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. Cited by: §1.
  • [13] Y. He and W. Sun (2007) Stability and convergence of the crank–nicolson/adams–bashforth scheme for the time-dependent navier–stokes equations. SIAM Journal on Numerical Analysis 45 (2), pp. 837–869. Cited by: 3rd item, §4.2.
  • [14] X. Jin, S. Cai, H. Li, and G. E. Karniadakis (2021) NSFnets (navier-stokes flow nets): physics-informed neural networks for the incompressible navier-stokes equations. Journal of Computational Physics 426, pp. 109951. Cited by: §A.1.4, §A.2.1, §2.2.
  • [15] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §A.1.4.
  • [16] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer (2021) Machine learning accelerated computational fluid dynamics. arXiv preprint arXiv:2102.01010. Cited by: §2.2.
  • [17] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar (2021) Neural operator: learning maps between function spaces. arXiv preprint arXiv:2108.08481. Cited by: §3.3.
  • [18] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2020) Fourier neural operator for parametric partial differential equations. External Links: 2010.08895 Cited by: §A.1.3, §1, §2.4, 1st item.
  • [19] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2020) Multipole graph neural operator for parametric partial differential equations. External Links: 2006.09535 Cited by: §1.
  • [20] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2020) Neural operator: graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485. Cited by: §A.1.2, Table 3, 3rd item, §1, §2.4.
  • [21] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar (2021) Markov neural operators for learning chaotic systems. arXiv preprint arXiv:2106.06898. Cited by: §A.1.4, 2nd item.
  • [22] Z. Long, Y. Lu, X. Ma, and B. Dong (2018) Pde-net: learning pdes from data. In International Conference on Machine Learning, pp. 3208–3216. Cited by: §1.
  • [23] D. Lu, H. Wang, M. Chen, L. Lin, R. Car, E. Weinan, W. Jia, and L. Zhang (2021) 86 pflops deep potential molecular dynamics simulation of 100 million atoms with ab initio accuracy. Computer Physics Communications 259, pp. 107624. Cited by: §1.
  • [24] L. Lu, P. Jin, and G. E. Karniadakis (2019) DeepONet: learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193. Cited by: §A.1.2, Table 3, 2nd item, §1.
  • [25] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis (2021) DeepXDE: a deep learning library for solving differential equations. SIAM Review 63 (1), pp. 208–228. External Links: Document Cited by: §A.1.4.
  • [26] R. G. Patel, N. A. Trask, M. A. Wood, and E. C. Cyr (2021) A physics-informed operator regression framework for extracting data-driven continuum models. Computer Methods in Applied Mechanics and Engineering 373, pp. 113500. Cited by: §1.
  • [27] J. Pathak, M. Mustafa, K. Kashinath, E. Motheau, T. Kurth, and M. Day (2021) ML-pde: a framework for a machine learning enhanced pde solver. Bulletin of the American Physical Society. Cited by: §2.2.
  • [28] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1, §2.2, §3.3.
  • [29] M. Raissi, A. Yazdani, and G. E. Karniadakis (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1, §2.2.
  • [30] J. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of computational physics 375, pp. 1339–1364. Cited by: §2.2.
  • [31] J. D. Smith, Z. E. Ross, K. Azizzadenesheli, and J. B. Muir (2021) HypoSVI: hypocenter inversion with stein variational inference and physics informed neural networks. arXiv. Cited by: §1.
  • [32] L. Sun, H. Gao, S. Pan, and J. Wang (2020) Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. Computer Methods in Applied Mechanics and Engineering 361, pp. 112732. Cited by: §1.
  • [33] K. Um, P. Holl, R. Brand, N. Thuerey, et al. (2020) Solver-in-the-loop: learning from differentiable physics to interact with iterative pde-solvers. arXiv preprint arXiv:2007.00016. Cited by: §1.
  • [34] R. Wang, K. Kashinath, M. Mustafa, A. Albert, and R. Yu (2020) Towards physics-informed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1457–1466. Cited by: §1.
  • [35] S. Wang, Y. Teng, and P. Perdikaris (2021) Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing 43 (5), pp. A3055–A3081. Cited by: §1, §2.2.
  • [36] S. Wang, H. Wang, and P. Perdikaris (2021) Learning the solution operator of parametric partial differential equations with physics-informed deeponets. arXiv preprint arXiv:2103.10974. Cited by: §A.1.1, §1, §4.1.
  • [37] S. Wang, X. Yu, and P. Perdikaris (2020) When and why pinns fail to train: a neural tangent kernel perspective. arXiv preprint arXiv:2007.14527. Cited by: §1, §2.2.
  • [38] E. Weinan and B. Yu (2018) The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §2.2, §2.2.
  • [39] L. Zhang, Z. J. Xu, and Y. Zhang (2021) Data-informed deep optimization. arXiv preprint arXiv:2107.08166. Cited by: §1.
  • [40] Y. Zhu, N. Zabaras, P. Koutsourelakis, and P. Perdikaris (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. Cited by: §1, §1, §3.3.

Appendix A Appendix

a.1 Implementation details

In this section, we list the detailed experiment setups and parameter searching for each experiment in Section 4. Without specification, we use Fourier neural operator backbone with , , and GeLU activation. The numerical experiments are performed on Nvidia V100 GPUs.

a.1.1 Burgers Equation

We use the initial conditions where to train the solution operator on PINO with , , and GeLU activation. We use the numerical method to take the gradient. We use Adam optimizer with the learning rate that decays by half every epochs. epochs in total. The total training time is about on a single Nvidia 3090 GPU. PINO achieves 0.37% relative l2 error averaged over 200 testing instances. PINN-DeepONet achieves 1.38% which is taken from Wang et al. [36] which uses the same problem setting.

a.1.2 Darcy Flow

We use the coefficient conditions to train the solution operator where where , if ; if . The zero boundary condition is enforced by multiplying a mollifier

for all methods. The parameter of PINO on Darcy Flow is the same as in the Burgers equation above. Regarding the implementation detail of the baselines: as for FNO, we use the same hyperparameters as its paper did

[20]; DeepONet [24] did not study Darcy flow so we grid search the hyperparameters of DeepONet: depth from 2 to 12, width from 50 to 100. The best result of DeepONet is achieved by depth 8, width 50. The results are shown in Table 3.

Method Solution error Equation error
DeepONet with data [24] -
FNO with data [20]
PINO with data
PINO w/o data
Table 3: Operator learning on Darcy Flow.

a.1.3 Long temporal transient flow.

We study the build-up of the flow from the initial condition near-zero velocity to that reaches the ergodic state. We choose as in Li et al. [18]. We choose the weight parameters of error . The initial condition is generated according to where with periodic boundary conditions. The forcing is kept fixed . We compare FNO, PINO (no test-time optimization), and PINO (with test-time optimization). They get , , and relative l2 error on the last time step over 5 testing instance.

a.1.4 Chaotic Kolmogorov flow.

For this experiment, lies in the attractor. We choose or , and , similar to Li et al. [21]. The training set consists of 4000 initial condition functions and corresponding solution functions with a spatial resolution of and a temporal resolution of . Extra initial conditions are generated from Gaussian random field

. We estimate the generalization error of the operator on a test set that contains 300 instances of Kolmogorov flow and reports the averaged relative

error. Each neural operator is trained with 4k data points plus a number of extra sampled initial conditions. The Reynolds number in this problem is 500. The reported generalization error is averaged over 300 instances.

Comparison study.

The baseline method PINN is implemented using library DeepXDE [25]

with TensorFlow as backend. We use the two-step optimization strategy (Adam

[15] and L-BFGS) following the same practice as NSFNet [14], which applies PINNs to solving Navier Stokes equations. We grid search the hyperparameters: network depth from 4 to 6, width from 50 to 100, learning rate from 0.01 to 0.0001, the weight of boundary loss from 10 to 100 for all experiments of PINNs. Comparison between PINO and PINNs on test-time optimization. The results are averaged over 20 instances of the Navier Stokes equation with Reynolds number 500. The best result is obtained by PINO using pre-trained operator ansatz and virtual sampling. The neural operator ansatz used here is trained over a set of 400 data points. The authors acknowledge that there could exist more sophisticated variants of PINN that performs better in our test cases.

# data samples # additional PDE instances Resolution Solution error Equation error
400 0 128x128x65 33.32% 1.8779
33.31% 1.8830
30.61% 1.8421
400 40k 128x128x65 31.74% 1.8179
31.72% 1.8227
29.60% 1.8296
400 160k 128x128x65 31.32% 1.7840
31.29% 1.7864
29.28% 1.8524
4k 0 0.2515 1.8223
0.2516 1.8257
0.2141 1.8468
4k 100k 0.2415 1.6112
0.2411 1.6159
0.2085 1.8251
4k 400k 0.2422 1.4596
0.2395 1.4656
0.2010 1.9146
0 100k 0.7436 0.3741
0.7438 0.3899
0.7414 0.5226
Table 4: Each neural operator is trained with 4k data points additionally sampled free initial conditions. The Reynolds number is 500. The reported generalization error is averaged over 300 instances. Training on additional initial conditions boosts the generalization ability of the operator.

a.2 Transfer learning across Reynolds numbers

We study the test-time optimization with different Reynolds numbers on the 1s Kolmogorov flow. For the higher Reynolds number problem , the pre-training operator shows better convergence accuracy. In all cases, the pre-training operator shows better convergence speed as demonstrated in Figure 5. The results are shown in Table 5 where the error is averaged over 40 instances. Each row is a testing case and each column is a pre-trained operator.

Testing RE From scratch 100 200 250 300 350 400 500
500 0.0493 0.0383 0.0393 0.0315 0.0477 0.0446 0.0434 0.0436
400 0.0296 0.0243 0.0245 0.0244 0.0300 0.0271 0.0273 0.0240
350 0.0192 0.0210 0.0211 0.0213 0.0233 0.0222 0.0222 0.0212
300 0.0168 0.0161 0.0164 0.0151 0.0177 0.0173 0.0170 0.0160
250 0.0151 0.0150 0.0153 0.0151 0.016 0.0156 0.0160 0.0151
200 0.00921 0.00913 0.00921 0.00915 0.00985 0.00945 0.00923 0.00892
100 0.00234 0.00235 0.00236 0.00235 0.00239 0.00239 0.00237 0.00237
Table 5:

Reynolds number transfer learning. Each row is a test set of PDEs with corresponding Reynolds number. Each column represents the operator ansatz we use as the starting point of test-time optimization. For example, column header ’100’ means the operator ansatz is trained over a set of PDEs with Reynolds number 100. The relative

errors is averaged over 40 instances of the corresponding test set.
Figure 5: Plot of test relative error versus update step for the Kolmogorov flow with Re500, T=1s. We observe that a ll the operator ansatzs pretrained over PDE instances with different Reynolds number can boost the test-time optimization accuracy and speed compared to training from scratch.

a.2.1 Lid Cavity flow.

We assume a no-slip boundary where at left, bottom, and right walls and on top, similar to Bruneau and Saad [1]. We choose , , . We use the velocity-pressure formulation as in Jin et al. [14] where the neural operator output the velocity field in , , and the pressure field. We set , with learning rate which decreases by half every iterations, iterations in total. We use the Fourier method with Fourier continuation to compute the numerical gradient and minimize the residual error on the velocity, the divergence-free condition, as well as the initial condition and boundary condition. The weight parameters () between different error terms are all chosen as 1. Figure 3 shows the ground truth and prediction of the velocity field at where the PINO accurately predicts the ground truth.