1 Introduction
Machine learningbased methods are starting to show promise in scientific computing and especially in solving partial differential equations (PDEs). They have demonstrated advantages in both efficiency and accuracy compared to conventional solvers. They are even able to tackle previously intractable problems such as higherdimensional, multiscale, highcontrast, and chaotic PDE systems [33, 3, 8, 22, 12, 2]. Broadly, MLbased approaches for PDEs can be divided into two categories: optimizing to solve for a specific solution function of PDE vs. learning the solution operator over a family of PDEs.
Optimization of solution function and PINN.
Most MLbased methods, as well as the conventional solvers, fall into this category. Conventional solvers such as FDM and FEM usually discretize the domain into a grid and optimize/approximate the solution function on the grid, which imposes a truncation error. The PhysicsInformed Neural Network (PINN)type methods are proposed to overcome the discretization issue [28]. They use a neural network as the ansatz of the solution function and take advantage of autodifferentiation to compute the exact, meshfree derivatives. Recently, researchers have developed numerous variations of PINN with promising results on inverse problems and partially observed tasks [23, 40, 31]. However, compared to conventional solvers, PINNs face several optimization issues: (1) the challenging optimization landscape from soft physics or PDE constraints [35], (2) the difficulty to propagate information from the initial or boundary conditions to unseen parts of the interior or to future times [7], and (3) the sensitivity to hyperparameters selection [32]. As a result, PINNs are still unable to compete with conventional solvers in most cases, and they often fail to converge on highfrequency or multiscale PDEs [37, 9, 29]. In this work, we propose to overcome these optimization challenges by integrating operator learning with PINN.
Operator learning and neural operators.
A recent alternative approach is to learn the solution operator of a family of PDEs, defined by the map from the input–initial conditions and boundary conditions, to the output–solution functions. In this case, usually, a dataset of inputoutput pairs from an existing solver is given. There are two main aspects to consider (a) model: to design models for learning highly complicated PDE solution operators, and (b) data: to be dataefficient and to improve generalization. Recent advances in operator learning replace traditional convolutional neural networks and UNets from computer vision with operatorbased models tailored to PDEs with greatly improved model expressiveness
[20, 24, 26, 34, 6]. Specifically, the neural operator generalizes the neural network to the operator setting where the input and output spaces are infinitedimensional. The framework has shown success in learning resolutioninvariant solution operators for highly nonlinear problems such as turbulence flow [19, 18]. However, the data challenges remain: (1) the need for training data, which assumes an existing solver or experimental setup, (2) the nonnegligible generalization error, and (3) extrapolation to unseen conditions. These issues can be addressed by adding physics or PDE constraints to operator learning [40, 36, 39].Our contributions.
To overcome the shortcomings of both physicsinformed optimization and datadriven operator learning, we propose the physicsinformed neural operator (PINO). It requires fewer or no data points to learn the operator and generalizes better. In PINO, we use the pretrained operator as the ansatz to optimize for the solution function at test time, which reduces the generalization error. Compared to PINN, PINO has a much better optimization landscape and representation space, and hence, PINO converges faster and more accurately. Our contributions can be summarized as follows:

We propose the physicsinformed neural operator (PINO), combining the operatorlearning and physicsinformed settings. We introduce the pretraining and testtime optimization schemes that utilize both the data and equation constraints (whichever are available). We develop an efficient method to compute the exact gradient for neural operators to incorporate the equation constraints.

By utilizing pretrained operator ansatz, PINO overcomes the challenge of propagating information from the initial condition to future time steps with (soft) physics constraints. It can solve the 2d transient flow over an extremely long time period, where PINN and DeepONet [24] fail to converge. Even without any pretraining and using only PDE constraints for the given instance, PINO still outperforms PINN by 20x smaller error and 25x speedup on the chaotic Kolmogorov flow, demonstrating superior expressivity of the neural operators over standard neural networks.

By utilizing the equation constraints, PINO requires fewer or no training data and generalizes better compared to FNO [20]. On average it has 7% smaller error on the transient and Kolmogorov flows, while matching the speedup of FNO (400x) compared to the GPUbased pseudospectral solver [13], matching FNO. Further, the pretrained PINO model on the Navier Stokes equation can be easily transferred to different Reynolds numbers ranging from to using testtime optimization.

We propose the forward and backward PINO models for inverse problems. Our approach accurately recovers the coefficient function in the Darcy flow which is 3000x faster than the conventional solvers using accelerated MCMC [5].
PINN vs. PINO: pointwise vs. functionwise optimization.
The neural operator ansatz in PINO has an easier optimization landscape and a more expressive representation space compared to the neural networks ansatz in PINN. The neural operator parameterizes the solution function as an aggregation of basis functions, and hence, the optimization is in the function space. This is easier than just optimizing a single function as in PINN. Further, we can learn these basis functions in the pretraining phase which makes the testtime optimization on the querying instance even easier. In this case, PINO does not need to solve from scratch. It just finetunes the solution function parameterized by the solution operator. Thus, PINO is much faster and more accurate compared to PINN.
2 Preliminaries and problem settings
In this section, we first define the stationary and dynamic PDE systems that we consider. We give an overview of the physicsinformed setting and operatorlearning setting. In the end, we define the Fourier neural operator as a specific model for operator learning.
2.1 Problem settings
We consider two natural class of PDEs. In the first, we consider the stationary system
(1) 
where is a bounded domain, is a PDE coefficient/parameter, is the unknown, and is a possibly nonlinear partial differential operator with a triplet of Banach spaces. Usually the function is a fixed boundary condition (potentially can be entered as a parameter). This formulation gives rise to the solution operator defined by . A prototypical example is the secondorder elliptic equation .
In the second setting, we consider the dynamical system
(2) 
where is the initial condition, for is the unknown, and is a possibly nonlinear partial differential operator with , and Banach spaces. As before, we take to be a known boundary condition. We assume that exists and is bounded for all time and for every . This formulation gives rise to the solution operator defined by . Prototypical examples include the Burgers’ equation and the NavierStokes equation.
2.2 Solving equation using the physicsinformed loss (PINN)
Given an instance and a solution operator defined by equations (1) or (2) , we denote by the unique ground truth. The equation solving task is to approximate . This setting consists of the MLenhanced conventional solvers such as learned finite element, finite difference, and multigrid solvers [16, 27, 11], as well as purely neural networkbased solvers such as the PhysicsInformed Neural Networks (PINNs), Deep Galerkin Method, and Deep Ritz Method [28, 30, 38]. Especially, these PINNtype methods use a neural network with parameters as the the ansatz to approximate the solution function . The parameters are found by minimizing the physicsinformed loss with exact derivatives computed using automaticdifferentiation (autograd). In the stationary case, the physicsinformed loss is defined by minimizing the l.h.s. of equation (1) in the squared norm of . A typical choice is
, giving the loss function
(3) 
In the case of a dynamical system, it minimizes the residual of equation (2) in some natural norm up to a fixed final time . A typical choice is the norm, yielding
(4) 
The PDE loss consists of the physics loss in the interior and the data loss on the boundary and initial conditions, with hyperparameters . It can be generalized to variational form as in [38].
Challenges of PINN.
PINNs take advantage of the universal approximability of neural networks, but, in return, suffer from the lowfrequency induced bias. Empirically, PINNs often fail to solve challenging PDEs when the solution exhibits highfrequency or multiscale structure [35, 37, 9, 29]. Further, as an iterative solver, PINNs have difficulty propagating information from the initial condition or boundary condition to unseen parts of the interior or to future times [7]. For example, in challenging problems such as turbulence, PINNs are only able to solve the PDE on a relatively small domain [14], or otherwise, require extra observational data which is not always available in practice [29, 4]. In this work, we propose to overcome the challenges posed by the optimization by integrating operator learning with PINNs.
2.3 Learning the solution operator (neural operator)
An alternative setting is to learn the solution operator . Given a PDE as defined in (1) or (2) and the corresponding solution operator , one can use a neural operator with parameters as a surrogate model to approximate . Usually we assume a dataset is available, where and are i.i.d. samples from some distribution supported on . In this case, one can optimize the solution operator by minimizing the empirical data loss on a given data pair
(5) 
where we assume the setting of (1) for simplicity of the exposition. The operator data loss is defined as the average error across all possible inputs
(6) 
Similarly, one can define the operator PDE loss as
(7) 
In general, it is nontrivial to compute the derivatives and for model . In the following section, we will discuss how to compute these derivatives for Fourier neural operator.
2.4 Fourier neural operators
In this work, we will focus on the neural operator model designed for the operator learning problem. The neural operator, proposed in [20], is formulated as an generalization of standard deep neural networks to operator setting. Neural operator compose linear integral operator
with pointwise nonlinear activation function
to approximate highly nonlinear operators.Definition 1 (Neural operator )
Define the neural operator
(8) 
where , are the pointwise neural networks that encode the lower dimension function into higher dimensional space and decode the higher dimension function back to the lower dimensional space. The model stack layers of where are pointwise linear operators (matrices), are integral kernel operators, and are fixed activation functions. The parameters consists of all the parameters in .
Recently, Li et al. [18] proposes the Fourier neural operator (FNO) that deploys convolution operator for
. In this case, it can apply the Fast Fourier Transform (FFT) to efficiently compute
. This leads to a fast architecture that obtains stateoftheart results for PDE problems.Definition 2 (Fourier convolution operator )
Define the Fourier convolution operator
(9) 
where is part of the parameter to be learn.
Challenges of Operator learning.
Operator learning is similar to supervised learning in computer vision and language where data play a very important role. One needs to assume the training points and testing points follow the same problem setting and the same distribution. Especially, the previous FNO model trained on one coefficient (e.g. Reynolds number) or one geometry cannot be easily generalized to another. Moreover, for more challenging PDEs where the solver is very slow or the solver is even not existent, it is hard to gather a representative dataset. On the other hand, since FNO doesn’t use any knowledge of the equation, it is cannot get arbitrarily close to the ground truth by using the higher resolution as in conventional solvers, leaving a gap of generalization error. These challenges limit FNO’s applications beyond accelerating the solver. In the following section, we will introduce the PINO framework to overcome these problems by using the equation constraints.
3 Physicsinformed neural operator (PINO)
We propose the PINO framework that uses one neural operator model for solving both operator learning problems and equation solving problems. It consists of two phases

Pretrain the solution operator: learn a neural operator to approximate using either/both the data loss and/or the PDE loss which can be additional to the dataset.

Testtime optimization: use as the ansatz to approximate with the pde loss and/or an additional operator loss obtained from the pretrain phase.
3.1 Pretrain: operator learning with PDE loss
Given a fixed amount of data , the data loss offers a stronger constraint compared to the PDE loss . However, the PDE loss does not require a preexisting dataset. One can sample virtual PDE instances by drawing additional initial conditions or coefficient conditions for training, as shown in Algorithm 1. In this sense, we have access to the unlimited dataset by sampling new in each iteration.
3.2 Testtime optimization: solving equation with operator ansatz
Given a learned operator , we use as the ansatz to solve for . The optimization procedure is similar to PINNs where it computes the PDE loss on , except that we propose to use a neural operator instead of a neural network. Since the PDE loss is a soft constraints and challenging to optimize, we also add an optional operator loss (anchor loss) to bound the further optimized model from the pretrained model
where is the model at
training epoch. We update the operator
using the loss . It is possible to further apply optimization techniques to finetune the last fewer layers of the neural operator and progressive training that gradually increase the grid resolution and use finer resolution in test time.Optimization landscape.
Using the operator as the ansatz has two major advantages: (1) PINN does pointwise optimization, while PINO does optimization in the space of functions. In the linear integral operation , the operator parameterizes the solution function as a sum of the basis function. Optimization of the set of coefficients and basis is easier than just optimizing a single function as in PINNs. (2) we can learn these basis functions in the operator learning phase which makes the later testtime optimization even easier. In PINO, we do not need to propagate the information from the initial condition and boundary condition to the interior. It just requires finetuning the solution function parameterized by the solution operator.
Tradeoff.
(1) complexity and accuracy: the testtime optimization is an option to spend more computation in exchange for better accuracy. When the speed is most desired, one should use the pretrained model to directly output the prediction. However, if one wants to solve for a specific instance accurately, then the person can use testtime optimization to correct the generalization error posed in operator learning. (2) resolution effects on optimization landscape and truncation error: using a higher resolution and finer grid will reduce the truncation error. However, it may make the optimization unstable. Using hard constraints such as the anchor loss relieves such a problem.
3.3 Derivatives of neural operators
In order to use the equation loss , one of the major technical challenge is to efficiently compute the derivatives for neural operators. In this section, we discuss three efficient methods to compute the derivatives of neural operator as defined in 8.
Numerical differentiation.
A simple but efficient approach is to use conventional numerical gradients such as finite difference and Fourier gradient [40, 10]. These numerical gradient methods are fast and memoryefficient: given a points grid, finite difference requires and Fourier method requires . However, they face the same challenges as the corresponding numerical solvers: finite difference methods require a fineresolution uniform grid; spectral methods require smoothness and uniform grids. Especially. These numerical errors on the gradient will be amplified on the output solution.
Autograd.
Similar to PINN [28], the most general way to compute the exact gradient is to use the autodifferentiation library of neural networks (autograd). To apply autograd, one needs to use a neural network to parameterize the solution function . However, it is not straightforward to write out the solution function in the neural operator which directly outputs the numerical solution on a grid, especially for FNO which uses FFT. To apply autograd, we design a query function that input and output . Recall and . Since is pointwise,
(10) 
To query any location , we need to evaluate the Fourier convolution and the bias part . For the first part, we can directly query the Fourier coefficient at spatial location without doing the inverse FFT
. For the second part, we can define the query function as an interpolation or a lowrank integral operator as in
[17]. Then we can use autograd to take the exact derivative for (10). The autograd method is general and exact, however, it is less efficient. Since the number of parameters is usually much greater than the grid size , the numerical methods are indeed significantly faster. Empirically, the autograd method is usually slower and memoryconsuming.Exact gradients.
We develop an efficient and exact gradient method based on the architecture of the neural operator. The idea is similar to the autograd, but we explicitly write out the gradient on the Fourier space and apply the chain rule. Given the explicit form (
10), can be directly compute on the Fourier space. The linear part can be interpolated using the Fourier method. Therefore to exactly compute the derivative of , one just needs to run the numerical Fourier gradient. Then we just need to apply chain rule for Jacobian and Hessian . Since is pointwise, this can be much simplified. In the experiments, we mostly use the exact gradient and the numerical Fourier gradient.4 Experiments
In this section, we conduct empirical experiments to examine the efficacy of the proposed PINO. In 4.1, we show the physics constraints help operatorlearning with fewer data and better generalization. Then in 4.2, we investigate how PINO uses operator ansatz to solve harder equations with improved speed and accuracy. We study three concrete cases of PDEs on Burgers’ Equation, Darcy Equation, and NavierStokes equation. The implementation details of PINO and baseline methods are listed in the Appendix A.
Burgers’ Equation.
The 1d Burgers’ equation is a nonlinear PDE with periodic boundary conditions where is the initial condition and is the viscosity coefficient. We aim to learn the operator mapping the initial condition to the solution, .
(11) 
Darcy Flow.
The 2d steadystate Darcy Flow equation on the unit box which is the second order linear elliptic PDE with a Dirichlet boundary where is a piecewise constant diffusion coefficient and is a fixed forcing function. We are interested in learning the operator mapping the diffusion coefficient to the solution, . Note that although the PDE is linear, the operator is not.
(12) 
NavierStokes Equation.
We consider the 2d NavierStokes equation for a viscous, incompressible fluid in vorticity form on the unit torus, where for any is the velocity field, is the vorticity, is the initial vorticity, is the viscosity coefficient, and is the forcing function. We want to learn the operator mapping the vorticity from the initial condition to the full solution .
(13) 
Specially, we consider two problem settings:

Long temporal transient flow: we study the buildup of the flow from the initial condition nearzero velocity to that reaches the ergodic state. We choose , , as in Li et al. [18]. The main challenge is to predict the long time interval.

Chaotic Kolmogorov flow: In this case lies in the attractor where arbitrary starting time . We choose or , , similar to Li et al. [21]. The main challenge is to capture the small details that evolve chaotically.

Lid cavity flow: In this case, we assume the noslip boundary condition where at left, bottom, and right walls and on top, similar to Bruneau and Saad [1]. We choose , , . The main challenge is to address the boundary using the velocitypressure formulation.
4.1 Operator learning with PDE losses
We first show the effectiveness of applying the physics constraints to learn the solution operator.
Burgers equation and Darcy equation.
PINO can learn the solution operator without any data on simpler problems such as Burgers and Darcy. Compared to other PDEconstrained operators, PINO is more expressive and thereby achieves better accuracy. On Burgers (11), PINNDeepONet achieves 1.38% [36]; PINO achieves 0.37%. Similarly, on Darcy flow (12), PINO outperforms FNO by utilizing physics constraints, as shown in Table 3. For these simpler equations, testtime optimization may not be needed. The implementation detail and the search space of parameters are included in the Appendix A.1.1 and A.1.2.
4.2 Solve equation using operator ansatz
# data samples  # additional PDE instances  Solution error  Equation error 

0  100k  74.36%  0.3741 
0.4k  0  33.32%  1.8779 
0.4k  40k  31.74%  1.8179 
0.4k  160k  31.32%  1.7840 
4k  0  25.15%  1.8223 
4k  100k  24.15%  1.6112 
4k  400k  24.22%  1.4596 
Long temporal transient flow.
It is extremely challenging to propagate the information from the initial condition to future time steps over such a long interval just using the soft physics constraint. Neither of the PINN and PINO (from scratch without pretraining) can handle this case (error ), no matter solving the full interval at once or solving per smaller steps. However, when the pretraining data is available for PINO, we can use the learned neural operator ansatz and the anchor loss . The anchor loss is a hard constraint that makes the optimization much easier. Providing training data, the PINO without testtime optimization achieves 2.87% error, lower than FNO 3.04% and it retains a 400x speedup compared to the GPUbased pseudospectral solver [13], matching FNO. Further doing test time optimization with the anchor loss and PDE loss, PINO reduces the error to 1.84%.
Chaotic Kolmogorov flow.
We show an empirical study on how PINO can improve the generalization of neural operators by enforcing more physics. For this experiment, the training set consists of 4000 data points of the initial condition and corresponding solution. Using Algorithm 1, we can sample unlimited additional initial conditions from a Gaussian random field at any resolution. Table 1 and 4 compare the generalization error of neural operators trained by different schemes and different amounts of simulated data. The result shows that training neural operator with additional PDE instances consistently improves the generalization error on all three resolutions we are evaluating.
Based on the solution operators learned in the above operatorlearning section, we continue to do testtime optimization. The results are shown in Figure and Table 2. Overall, PINO outperforms PINN by 20x smaller error and 25x speedup. Using a pretrained model makes PINO converge faster. The implementation detail and the search space of parameters are included in the Appendix A.2.1.
Method  # data samples  # additional PDE instances  Solution error  Time cost 
PINNs      18.7%  4577s 
PINO  0  0  0.9%  608s 
PINO  4k  0  0.9%  536s 
PINO  4k  160k  0.9%  473s 
Transfer Reynolds numbers.
The extrapolation of different parameters and conditions is one of the biggest challenges for MLbased methods. By doing testtime optimization, the pretrained PINO model on the Kolmogorov flow can be easily transferred to different Reynolds numbers ranging from to in testtime optimization as shown in Table 5. Such property envisions broad applications.
Lid cavity flow.
We demonstrate an addition example using PINO to solve for lidcavity flow on with . In this case, we do not have the pretrain phase and directly solve the equation (testtime optimization). We use PINO using the velocitypressure formulation with resolution and Fourier numerical gradient. It takes minutes to achieve a relative error of . Figure 3 shows the ground truth and prediction of the velocity field at where the PINO accurately predicts the ground truth.
4.3 Inverse problem
One of the major advantages of the physicsinformed method is to solve the inverse problem. In this case, we investigate PINO on the inverse problem of the Darcy equation to recover the coefficient function from the given solution function . We assume a dataset is available to pretrain the operator. We propose two formulations of PINO for the inverse problem:

Forward model: Learn the forward operator with data. Initialize to approximate . Optimize using

Backward model: Learn the backward operator with data. Use to approximate . Optimize using
Where is the regularization term. We use the PDE loss to deal with the small error in and the illdefining issue of .
We perform a showcase example on the Darcy equation, where we are given the fullfield observation of (clean data) to recover the coefficient function . The coefficient function is piecewise constant (representing two types of media), so the inverse problem can be viewed as a classification problem. We define
as the total variance. As shown in Figure
4, the PINO backward model has the best performance: it has 2.29% relative l2 error on the output and 97.10% classification accuracy on the input ; the forward model has 6.43% error on the output and 95.38% accuracy on the input. As a reference, we compare the PINO inverse frameworks with PINN and the conventional solvers using the accelerated MCMC method with 500,000 steps [5]. The posterior mean of the MCMC has 4.52% error and 90.30% respectively ( Notice the Bayesian method outputs the posterior distribution, which is beyond obtaining a maximum a posteriori estimation) Meanwhile, PINO methods are 3000x faster compared to MCMC. PINN does not converge in this case. The major advantage of the PINO backward model compared to the PINO forward model is that it uses a neural operator
as the ansatz for the coefficient function. Similar to the forward problem, the operator ansatz has an easier optimization landscape while being expressive.5 Conclusion and Future works
In this work, we develop the physicsinformed neural operator (PINO) that bridges the gap between physicsinformed optimization and datadriven neural operator learning. We introduce the pretraining and testtime optimization schemes for PINO to utilize both the data and physics constraints. In the pretraining phase, PINO learns an operator ansatz over multiple instances of a parametric PDE family. The testtime optimization scheme allows us to take advantage of the learned neural operator ansatz and solve for the solution function on the querying instance faster and more accurately.
There are many exciting future directions. Most of the techniques and analysis of PINN can be transferred to PINO. It is also interesting to ask how to overcome the hard tradeoff of accuracy and complexity, and how the PINO model transfers across different geometries. Furthermore, we can develop a software library of pretrained models. PINO’s excellent extrapolation property allows it to be applied on a broad set of conditions, as shown in the Transfer Reynold’s number experiments.
Acknowledgement
Z. Li gratefully acknowledges the financial support from the Kortschak Scholars Program. N. Kovachki is partially supported by the Amazon AI4Science Fellowship. A. Anandkumar is supported in part by Bren endowed chair, Microsoft, Google, Adobe faculty fellowships, and DE Logi grant. The authors want to thank Sifan Wang for meaningful discussions.
References
 [1] (2006) The 2d liddriven cavity problem revisited. Computers & fluids 35 (3), pp. 326–348. Cited by: §A.2.1, 3rd item.
 [2] (2021) FCbased shockdynamics solver with neuralnetwork localized artificialviscosity assignment. arXiv preprint arXiv:2111.01315. Cited by: §1.
 [3] (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52, pp. 477–508. Cited by: §1.
 [4] (2021) Physicsinformed neural networks (pinns) for fluid mechanics: a review. arXiv preprint arXiv:2105.09506. Cited by: §2.2.
 [5] (2013) MCMC methods for functions: modifying old algorithms to make them faster. Statistical Science 28 (3), pp. 424–446. Cited by: 4th item, §4.3.
 [6] (2021) Nonlinear independent dual system (nids) for discretizationindependent surrogate modeling over complex geometries. External Links: 2109.07018 Cited by: §1.
 [7] (2020) Physics informed extreme learning machine (pielm)–a rapid method for the numerical solution of partial differential equations. Neurocomputing 391, pp. 96–118. Cited by: §1, §2.2.
 [8] (2018) A multiscale neural network based on hierarchical matrices. arXiv preprint arXiv:1807.01883. Cited by: §1.
 [9] (2020) Limitations of physics informed machine learning for nonlinear twophase transport in porous media. Journal of Machine Learning for Modeling and Computing 1 (1). Cited by: §1, §2.2.
 [10] (2021) PhyGeoNet: physicsinformed geometryadaptive convolutional neural networks for solving parameterized steadystate pdes on irregular domain. Journal of Computational Physics 428, pp. 110079. Cited by: §3.3.
 [11] (2019) Learning to optimize multigrid pde solvers. In International Conference on Machine Learning, pp. 2415–2423. Cited by: §2.2.

[12]
(2018)
Solving highdimensional partial differential equations using deep learning
. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. Cited by: §1.  [13] (2007) Stability and convergence of the crank–nicolson/adams–bashforth scheme for the timedependent navier–stokes equations. SIAM Journal on Numerical Analysis 45 (2), pp. 837–869. Cited by: 3rd item, §4.2.
 [14] (2021) NSFnets (navierstokes flow nets): physicsinformed neural networks for the incompressible navierstokes equations. Journal of Computational Physics 426, pp. 109951. Cited by: §A.1.4, §A.2.1, §2.2.
 [15] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §A.1.4.
 [16] (2021) Machine learning accelerated computational fluid dynamics. arXiv preprint arXiv:2102.01010. Cited by: §2.2.
 [17] (2021) Neural operator: learning maps between function spaces. arXiv preprint arXiv:2108.08481. Cited by: §3.3.
 [18] (2020) Fourier neural operator for parametric partial differential equations. External Links: 2010.08895 Cited by: §A.1.3, §1, §2.4, 1st item.
 [19] (2020) Multipole graph neural operator for parametric partial differential equations. External Links: 2006.09535 Cited by: §1.
 [20] (2020) Neural operator: graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485. Cited by: §A.1.2, Table 3, 3rd item, §1, §2.4.
 [21] (2021) Markov neural operators for learning chaotic systems. arXiv preprint arXiv:2106.06898. Cited by: §A.1.4, 2nd item.
 [22] (2018) Pdenet: learning pdes from data. In International Conference on Machine Learning, pp. 3208–3216. Cited by: §1.
 [23] (2021) 86 pflops deep potential molecular dynamics simulation of 100 million atoms with ab initio accuracy. Computer Physics Communications 259, pp. 107624. Cited by: §1.
 [24] (2019) DeepONet: learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193. Cited by: §A.1.2, Table 3, 2nd item, §1.
 [25] (2021) DeepXDE: a deep learning library for solving differential equations. SIAM Review 63 (1), pp. 208–228. External Links: Document Cited by: §A.1.4.
 [26] (2021) A physicsinformed operator regression framework for extracting datadriven continuum models. Computer Methods in Applied Mechanics and Engineering 373, pp. 113500. Cited by: §1.
 [27] (2021) MLpde: a framework for a machine learning enhanced pde solver. Bulletin of the American Physical Society. Cited by: §2.2.
 [28] (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1, §2.2, §3.3.
 [29] (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1, §2.2.
 [30] (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of computational physics 375, pp. 1339–1364. Cited by: §2.2.
 [31] (2021) HypoSVI: hypocenter inversion with stein variational inference and physics informed neural networks. arXiv. Cited by: §1.
 [32] (2020) Surrogate modeling for fluid flows based on physicsconstrained deep learning without simulation data. Computer Methods in Applied Mechanics and Engineering 361, pp. 112732. Cited by: §1.
 [33] (2020) Solverintheloop: learning from differentiable physics to interact with iterative pdesolvers. arXiv preprint arXiv:2007.00016. Cited by: §1.
 [34] (2020) Towards physicsinformed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1457–1466. Cited by: §1.
 [35] (2021) Understanding and mitigating gradient flow pathologies in physicsinformed neural networks. SIAM Journal on Scientific Computing 43 (5), pp. A3055–A3081. Cited by: §1, §2.2.
 [36] (2021) Learning the solution operator of parametric partial differential equations with physicsinformed deeponets. arXiv preprint arXiv:2103.10974. Cited by: §A.1.1, §1, §4.1.
 [37] (2020) When and why pinns fail to train: a neural tangent kernel perspective. arXiv preprint arXiv:2007.14527. Cited by: §1, §2.2.
 [38] (2018) The deep ritz method: a deep learningbased numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §2.2, §2.2.
 [39] (2021) Datainformed deep optimization. arXiv preprint arXiv:2107.08166. Cited by: §1.
 [40] (2019) Physicsconstrained deep learning for highdimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. 56–81. Cited by: §1, §1, §3.3.
Appendix A Appendix
a.1 Implementation details
In this section, we list the detailed experiment setups and parameter searching for each experiment in Section 4. Without specification, we use Fourier neural operator backbone with , , and GeLU activation. The numerical experiments are performed on Nvidia V100 GPUs.
a.1.1 Burgers Equation
We use the initial conditions where to train the solution operator on PINO with , , and GeLU activation. We use the numerical method to take the gradient. We use Adam optimizer with the learning rate that decays by half every epochs. epochs in total. The total training time is about on a single Nvidia 3090 GPU. PINO achieves 0.37% relative l2 error averaged over 200 testing instances. PINNDeepONet achieves 1.38% which is taken from Wang et al. [36] which uses the same problem setting.
a.1.2 Darcy Flow
We use the coefficient conditions to train the solution operator where where , if ; if . The zero boundary condition is enforced by multiplying a mollifier
for all methods. The parameter of PINO on Darcy Flow is the same as in the Burgers equation above. Regarding the implementation detail of the baselines: as for FNO, we use the same hyperparameters as its paper did
[20]; DeepONet [24] did not study Darcy flow so we grid search the hyperparameters of DeepONet: depth from 2 to 12, width from 50 to 100. The best result of DeepONet is achieved by depth 8, width 50. The results are shown in Table 3.a.1.3 Long temporal transient flow.
We study the buildup of the flow from the initial condition nearzero velocity to that reaches the ergodic state. We choose as in Li et al. [18]. We choose the weight parameters of error . The initial condition is generated according to where with periodic boundary conditions. The forcing is kept fixed . We compare FNO, PINO (no testtime optimization), and PINO (with testtime optimization). They get , , and relative l2 error on the last time step over 5 testing instance.
a.1.4 Chaotic Kolmogorov flow.
For this experiment, lies in the attractor. We choose or , and , similar to Li et al. [21]. The training set consists of 4000 initial condition functions and corresponding solution functions with a spatial resolution of and a temporal resolution of . Extra initial conditions are generated from Gaussian random field
. We estimate the generalization error of the operator on a test set that contains 300 instances of Kolmogorov flow and reports the averaged relative
error. Each neural operator is trained with 4k data points plus a number of extra sampled initial conditions. The Reynolds number in this problem is 500. The reported generalization error is averaged over 300 instances.Comparison study.
The baseline method PINN is implemented using library DeepXDE [25]
with TensorFlow as backend. We use the twostep optimization strategy (Adam
[15] and LBFGS) following the same practice as NSFNet [14], which applies PINNs to solving Navier Stokes equations. We grid search the hyperparameters: network depth from 4 to 6, width from 50 to 100, learning rate from 0.01 to 0.0001, the weight of boundary loss from 10 to 100 for all experiments of PINNs. Comparison between PINO and PINNs on testtime optimization. The results are averaged over 20 instances of the Navier Stokes equation with Reynolds number 500. The best result is obtained by PINO using pretrained operator ansatz and virtual sampling. The neural operator ansatz used here is trained over a set of 400 data points. The authors acknowledge that there could exist more sophisticated variants of PINN that performs better in our test cases.# data samples  # additional PDE instances  Resolution  Solution error  Equation error 
400  0  128x128x65  33.32%  1.8779 
33.31%  1.8830  
30.61%  1.8421  
400  40k  128x128x65  31.74%  1.8179 
31.72%  1.8227  
29.60%  1.8296  
400  160k  128x128x65  31.32%  1.7840 
31.29%  1.7864  
29.28%  1.8524  
4k  0  0.2515  1.8223  
0.2516  1.8257  
0.2141  1.8468  
4k  100k  0.2415  1.6112  
0.2411  1.6159  
0.2085  1.8251  
4k  400k  0.2422  1.4596  
0.2395  1.4656  
0.2010  1.9146  
0  100k  0.7436  0.3741  
0.7438  0.3899  
0.7414  0.5226 
a.2 Transfer learning across Reynolds numbers
We study the testtime optimization with different Reynolds numbers on the 1s Kolmogorov flow. For the higher Reynolds number problem , the pretraining operator shows better convergence accuracy. In all cases, the pretraining operator shows better convergence speed as demonstrated in Figure 5. The results are shown in Table 5 where the error is averaged over 40 instances. Each row is a testing case and each column is a pretrained operator.
Testing RE  From scratch  100  200  250  300  350  400  500 

500  0.0493  0.0383  0.0393  0.0315  0.0477  0.0446  0.0434  0.0436 
400  0.0296  0.0243  0.0245  0.0244  0.0300  0.0271  0.0273  0.0240 
350  0.0192  0.0210  0.0211  0.0213  0.0233  0.0222  0.0222  0.0212 
300  0.0168  0.0161  0.0164  0.0151  0.0177  0.0173  0.0170  0.0160 
250  0.0151  0.0150  0.0153  0.0151  0.016  0.0156  0.0160  0.0151 
200  0.00921  0.00913  0.00921  0.00915  0.00985  0.00945  0.00923  0.00892 
100  0.00234  0.00235  0.00236  0.00235  0.00239  0.00239  0.00237  0.00237 
Reynolds number transfer learning. Each row is a test set of PDEs with corresponding Reynolds number. Each column represents the operator ansatz we use as the starting point of testtime optimization. For example, column header ’100’ means the operator ansatz is trained over a set of PDEs with Reynolds number 100. The relative
errors is averaged over 40 instances of the corresponding test set.a.2.1 Lid Cavity flow.
We assume a noslip boundary where at left, bottom, and right walls and on top, similar to Bruneau and Saad [1]. We choose , , . We use the velocitypressure formulation as in Jin et al. [14] where the neural operator output the velocity field in , , and the pressure field. We set , with learning rate which decreases by half every iterations, iterations in total. We use the Fourier method with Fourier continuation to compute the numerical gradient and minimize the residual error on the velocity, the divergencefree condition, as well as the initial condition and boundary condition. The weight parameters () between different error terms are all chosen as 1. Figure 3 shows the ground truth and prediction of the velocity field at where the PINO accurately predicts the ground truth.
Comments
James_Morgan ∙
Thanks!
∙ reply