1 Introduction
Multiphase flow in porous media is important for many geoscience applications, including contaminant transport bear2010modeling, carbon capture and storage (CCS) pachauri2014climate, hydrogen storage hashemi2021pore, oil and gas extraction aziz1979petroleum, and nuclear waste storage amaziane2012numerical. Due to the multiphysics, nonlinear, and multiscale nature of these processes, numerical simulation is the primary approach used to solve mass and energy conservation equations for these applications orr2007theory. These numerical simulations are often very time consuming and computationally intensive since they require fine spatial and temporal discretization to accurately capture the flow processes Doughty2010; Wen2019. Meanwhile, the inherent uncertainty in property distributions of heterogeneous porous medium necessitates probabilistic assessments and inverse modeling to aid engineering decisions Kitanidis2015; Strandli2014. Both of these procedures require large numbers of forward numerical simulation runs and are often prohibitively expensive NationalAcademiesofSciencesEngineering2018.
A number of machine learningbased methods have been proposed over the past few years to provide faster alternatives to numerical simulation. Most existing machine learningbased methods can be categorized into the following two categories: (1) datadriven finitedimensional operators that learn Euclidean space mappings from numerical simulation data zhu2018bayesian; mo2019deep; Zhong2019; tang2020deep; WEN2021103223; WEN2021104009, and (2) physicsinformed/ physicsconstrained/neural finite difference learning methods that parameterize the solution functions with a neural network raissi2019physics; Zhu2019; haghighat2021sciann. The first type, finitedimensional operators, is often implemented with convolutional neural networks (CNN). These CNNbased models have been successful in providing fast and accurate predictions for highdimensional and complex multiphase flow problems WEN2021103223; jiang2021deep; WEN2021104009; tang2021deep; wu2020physics. However, CNNbased methods are prone to overfitting, therefore requiring large numerical simulation data sets that can be unmanageable as the problem dimension grows. Also, the results produced by these models are tied to the specific spatial and temporal meshes used in the numerical simulation data set. The second approach using neural finite difference methods requires separate trainings for any new instance of the parameters or coefficients raissi2019physics (e.g., new permeability map or injection rate). Therefore, these methods require as much computational effort as traditional numerical solvers, if not more. Furthermore, these methods often struggle with multiphase flow problems with shock fronts, which are common for many classes of geoscience problems fuks2020physics; almajid2021prediction.
Recently, a novel approach, the neural operator, has been proposed that directly learns the infinitedimensionalmapping from any functional parametric dependence to the solution lu2019deeponet; li2020multipole; li2020neural; bhattacharya2020model. Unlike neural finite difference methods, neural operators are datadriven therefore require training only once. Meanwhile, neural operators are meshindependent, so they can be trained and evaluated on different grids. Due to the cost of evaluating global neural integral operators, previously proposed neural operators have not yet been able to achieve the desirable degree of computational efficiency li2020fourier
. However, one type of neural operator, the Fourier neural operator (FNO), alleviates this issue through the implementation of a Fast Fourier Transform
li2020fourier. The FNO has shown excellent performance on singlephase flow problems with great generalization ability, and is significantly more data efficient than CNNbased methods li2020fourier.Here we extend the FNObased architecture to multiphase flow problems. We find that while FNO’s testing accuracy is generally higher than CNNbased models, the training accuracy is sometimes lower due to the regularization effect of the FNO architecture. To improve upon this, we present an enhanced Fourier neural operator, named UFNO, that combines the advantages of FNObased and CNNbased models to provide results that are both highly accurate and data efficient. Through the implementation of the newly proposed UFourier layer, we show that the UFNO model architecture produces superior performance over both the original FNO li2020fourier and a stateoftheart CNN benchmark WEN2021104009. We apply the UFNO architecture to the highly complex COandwater multiphase flow problem in the context of CO geological storage to predict dynamic pressure buildup and gas saturation. The trained UFNO models provide an alternative to numerical simulation for 2Dradial CO injection problems with wide ranges of permeability and porosity heterogeneity, anisotropy, reservoir conditions, injection configurations, flow rates, and multiphase flow properties.
2 Methods
The goal of a neural operator is to learn an infinitedimensionalspace mapping from a finite collection of inputoutput observations. To formulate the problem, we define the domain be a bounded and open set; be the input function space; be the output function space. and are separable Banach spaces of functions defined on that takes values in and respectively. is a nonlinear map that satisfy the governing PDEs. Suppose we have
that are drawn from probability measure
in , then . We aim to build an operator that learns an approximation of by minimizing the following problem using a cost function .(1) 
Since and are both functions, we use point discretization to numerically represent and . We demonstrate in this section that the proposed UFNO architecture learns the infinitedimensionalspace mapping from a finite collections of and pairs. A table of notation is included in A.
2.1 UFNO architecture
The UFNO architecture contains the following three steps:

Lift input observation to a higher dimensional space through a fully connected neural network transformation .

Apply iterative Fourier layers followed by iterative UFourier layers: where for and for are sequences of functions taking values in for channel dimension .

Project back to the original space using a fully connected neural network transformation .
Figure 1A provides a schematic of the UFNO architecture. Within each newly proposed UFourier layer, we have
(2) 
where is a kernel integral transformation parameterized by a neural network, is a UNet CNN operator, and is a linear operator, which are all learnable. is a nonlinear activation function. Refer to Li et al. li2020fourier for the formulation of the original Fourier layer.
2.2 Integral kernel operator in the Fourier space
The integral kernel operator in Equation 2 is defined by
(3) 
To efficiently parameterize kernel , the FNO method considers the representation (and also ) in the Fourier space and utilizes Fast Fourier Transform (FFT) li2020fourier. By letting in Equation 3 and applying the convolution theorem, we can obtain
(4) 
where denotes a Fourier transform of a function and is its inverse. Now, we can parameterize directly by its Fourier coefficients:
(5) 
where is the Fourier transform of a periodic function . Since we assume that is periodic, we can apply a Fourier series expansion and work in the discrete modes of Fourier transform.
We first truncate the Fourier series at a maximum number of modes , and then parameterize directly as a complex valued (
)tensor with the truncated Fourier coefficients. As a result, multiplication by the learnable weight tensor
is(6) 
By replacing the by the FFT and implementing using a direct linear parameterization, we have obtained the Fourier operator as illustrated in Figure 1B and C with nearly linear complexity.
2.3 Characteristics of the UFourier layer
In contrast to the original Fourier layer in FNO li2020fourier, the UFNO architecture proposed here appends a UNet path in each UFourier layer. The UNet processes local convolution to enrich the representation power of the UFNO in higher frequencies information. The number of Fourier and UFourier layers, and
, are hyperparameters that can be optimized for the specific problem. For the multiphase flow problem considered here, we found that the architecture with half Fourier layers and half UFourier layers achieves the best performance, compared to architectures with all Fourier layers or all UFourier layers.
Note that although the Fourier neural operator is an infinitedimensionaloperator, when we append the UNet block, we sacrifice the flexibility of training and testing the model at different discretizations. We made this choice because the COwater multiphase flow problem is very sensitive to numerical dispersion and numerical dissolution, which are both tied to a specific grid resolution. When training and testing at different grid dimensions, the numerical noise is often transformed in a nonphysical way. As a result, for this problem, we prioritize achieving higher training and testing accuracy, which the UFNO provides.
3 Problem setting
3.1 Governing equation
We consider a multiphase flow problem with CO and water in the context of geological storage of CO. The CO and water are immiscible but have mutual solubility. The general forms of mass accumulations for component or are written as Pruess1999:
(7)  
(8) 
Here denotes the phase of (wetting) or (nonwetting). In the siliciclastic rocks present at most geological storage sites, water is the wetting phase pini2012capillary. However, due to the mutual solubility of water and CO, there is a small amount of CO in the water phase and a small amount of water in the CO phase. Here is the porosity, is the saturation of phase , and is the mass fraction of component in phase .
For both components, the advective mass flux is obtained by summing over phases ,
(9) 
where each individual phase flux is governed by the multiphase flow extension of Darcy’s law. denotes the absolute permeability, is the relative permeability of phase that nonlinearly depends on , is the viscosity of phase that depends on , and is the gravitational acceleration.
Due to the effect of capillarity, the fluid pressure of each phase is
(10)  
(11) 
where the capillary pressure is a nonlinear function of . Additionally, porosity , density , and the solubility of in Equation 7 and Equation 8 are also nonlinear functions that depend on .
To simplify the problem setting, our simulation does not explicitly include molecular diffusion and hydrodynamic dispersion. However some unavoidable numerical dispersion resulting from approximating spatial gradients using the twopoint upstream algorithm eclipse is intrinsic to the numerical simulations used for the neural network training.
3.2 Numerical simulation setting
We use the numerical simulator ECLIPSE (e300) to develop the multiphase flow data set for CO geological storage. ECLIPSE is a full physics simulator that uses the finite difference method with upstream weighting for spatial discretization and the adaptive IMplicit method for temporal discretization eclipse. We inject supercritical CO at a constant rate into a radially symmetrical system through a vertical injection well with a radius of 0.1 m. The well can be perforated over the entire thickness of the reservoir or limited to a selected depth interval. We simulate CO injection for 30 years at a constant rate ranging from 0.2 to 2 Mt/year. The thickness of the reservoir ranges from 12 to 200 m with noflow boundaries on the top and bottom. We use a vertical cell dimension of 2.08 m to capture the vertical heterogeneity of the reservoir. The radius of the reservoir is 100,000 m. The outer boundary is closed, but is sufficiently distant from the injection well that it behaves like an infinite acting reservoir. Two hundred gradually coarsened grid cells are used in the radial direction. Grid sensitivity studies show that this grid is sufficiently refined to capture the CO plume migration and pressure buildup, while remaining computationally tractable Wen2019. Simulated values of the gas saturation () and pressure buildup () fields at 24 gradually coarsening time snapshots are used for training the neural nets. Refer to B for detailed spatial and temporal discretizations.
3.3 Variable sampling scheme
We sample two types of variables for each numerical simulation case: field variables and scalar variables. As shown in Figure 2, field variables include the horizontal permeability map (), vertical permeability map (), porosity map (), and injection perforation map (). The reservoir thickness is randomly sampled in each simulation case and controls the reservoir dimension in each of the following field variables:
variable type  sampling parameter  notation  distribution  unit 

field  horizontal permeability field  heterogeneous*    
# of anisotropic materials    
material anisotropy ratio    
porosity (perturbation)    
reservoir thickness  m  
perforation thickness  m  
perforation location    randomly placed    
scalar  injection rate  MT/y  
initial pressure  bar  
isothermal reservior temperature  C  
irreducible water saturation    
van Genuchten scaling factor   

: The Stanford Geostatistical Modeling Software (SGeMS) sgems is used to generate the heterogeneous maps. SGeMS produces permeability map according to required input parameters such as correlation lengths in the vertical and radial directions, medium appearances (C
), as well as permeability mean and standard deviation. A wide variety of permeability maps representing different depositional environments are included in the data set and the permeability value ranges widely from 10 Darcy to 0.001 mD.
C summarizes statistical parameters that characterize the permeability maps. 
: The vertical permeability map is calculated by multiplying the map by the anisotropy map. To generate the anisotropy map, values of are binned into materials, each of which is assigned a randomly sampled anisotropy ratio. Note that the anisotropy ratio is uncorrelated with the magnitude of the radial permeability. This procedure roughly mimics a faciesbased approach for assigning anisotropy values.

: Previous studies show that porosity and permeability are loosely correlated with each other pape2000variation. Therefore, to calculate porosity we first use the fitting relationship presented in Pape et al pape2000variation and then perturb these values with a random Gaussian noise with mean value of zero and standard deviation of 0.005.

: The injection interval thickness is randomly sampled within the range from 12 m to the specific reservoir thickness of that case. We placed the perforation interval on the injection well, by randomly sampling the depth of the perforation top from 0 m to m.
Visualizations of the above field variable are shown in C. Table 1 summarizes the parameter sampling ranges and distributions. The sampling parameters are independent of each other with the exception of porosity and permeability.
Scalar variables include the initial reservoir pressure at the top of the reservoir (), reservoir temperature (), injection rate (), capillary pressure scaling factor () li2013influence, and irreducible water saturation (). The parameter sampling range and distributions are summarized in Table 1. While the scalar variables and and determined independently, cases that yield unrealistic combinations of these variables are excluded. These field and scalar input variables create a very highdimensional input space, which is very challenging for traditional CNNbased models. Nevertheless, the UFNO model architecture handles the highdimensional input space with excellent data efficiency.
3.4 Data configuration
Each of the field variables in Figure 2A is represented by a channel in the data input. Since we use a gradually coarsening radial grid for the numerical simulations, a logarithm conversion in the radial direction is applied in training to project the field variables onto a uniform grid that can be represented by a
matrix. Notice that reservoir thickness is also a variable and 96 cells represents a 200m thick reservoir. When the reservoir is thinner than 200 m, we use zeropadding to denote cells that are outside of the actual reservoir. For the scalar variables, the values are simply broadcast into a matrix with dimension of
.In addition to the input variables, we also supply the spatial grid information to the training by using one channel to denote radial cell dimensions and another channel to denote vertical cell dimensions. The temporal grid information is supplied into the network as an additional dimension. The input to each data sample is constructed by concatenating the field variables, scalar variables, spatial grids, and temporal grid together.
For the gas saturation and pressure buildup outputs as shown in Figure 2
B and C, we use the same logarithm conversion to project the outputs onto a uniform grid. We then concatenate the outputs for different time snapshots to obtain a spatialtemporal 3D volume. The pressure buildup is normalized into zeromean and unitvariance distribution. For gas saturation, we do not normalize the data because the saturation values always range from 0 to 1. The dimensions of the input and outputs are shown for in each model architecture (Appendices D to G).
The data set contains 5,000 inputtooutput mappings. We use a 9/1 split to segregate the data set into 4,500 samples for training and 500 samples for testing.
3.5 Loss function design and training
We use a relative loss to train the deep learning models. The loss is applied to both the original output () and the first derivative of the output in the direction (), and is written as:
(12) 
where is the predicted output, is the first derivative of the predicted output, is the order of norm, and is a hyperparameter. This relative loss has a regularization effect and is particularly effective when the data have large variances on the norms. Our experiments show that, compared to an loss, a relative loss significantly improves the performance for both gas saturation and pressure buildup. The second term in Equation 12 greatly improves quality of predictions for gas saturation at the leading edge of the plume. Similarly this term improves prediction of the sharp pressure buildup around the injection well. We use the loss for gas saturation and pressure buildup since it provides faster convergence than the loss.
As described in Section 3, our data set contains reservoirs with various thicknesses and the cells outside of the reservoir are padded with zeros for both input and output. To accommodate for the variable reservoir thicknesses, during training, we construct an active cell mask for each data sample and only calculate the loss within the mask. Our experiments show that this loss calculation scheme achieves better performance than calculating the whole field because of the better gradient distribution efficiency.
During training, the initial learning rate is 0.001 and the learning rate gradually decreases with a constant step and reduction rate. These hyperparameters are optimized for the gas saturation and pressure buildup model separately. The training stops when the loss no longer decreases.
4 Results
This section compares 4 types of model architectures: original FNO proposed in Li et al. li2020fourier, the newly proposed UFNO in this paper, a convFNO that uses a conv3d in the place of the UNet, and the stateoftheart benchmark CNN used in Wen et al. WEN2021104009
. All models are trained on the proposed loss function (Equation
12) and directly output the 3D gas saturation and pressure field in space and time. Detailed parameters for each model are summarized in Appendices D to G.4.1 Gas saturation
Figure 3A and Table 2 demonstrates that the best performance for both the training and testing data set is achieved with the UFNO model. Specifically, the testing set performance represents the true predictability of the model for unseen data. The low relative loss clearly indicates the superior performance of the proposed UFNO.
Interestingly, we notice that although the original FNO has a higher training relative loss than the CNN benchmark, the testing relative loss by the original FNO is lower than that of the CNN benchmark. This indicates that FNO has excellent generalization ability and achieves better performance than the CNN even though FNO has a higher training relative loss. Nevertheless, the original FNO has the highest relative loss in the training set due to the inherent regularization effect by using a finite set of truncated Fourier basis. The ConvFNO and UFNO architecture is therefore designed to enhance the expressiveness by processing the higher frequency information that are not picked up by the Fourier basis. We can observe from Figure 3A that the training loss is significantly improved even by simply adding a plain conv3d in the ConvFNO case. When the FNO layer is combined with a UNet in the UFNO case, the model takes the advantages of both architectures and consistently produces the lowest relative loss throughout the entire training (Figure 3A).
Model  Gas saturation ()  Pressure buildup ()  

Train  Test  Train  Test  
mean  std  mean  std  mean  std  mean  std  
CNN  0.994  0.007  0.986  0.018  0.990  0.029  0.988  0.023 
FNO  0.990  0.013  0.986  0.017  0.991  0.014  0.990  0.017 
ConvFNO  0.993  0.011  0.990  0.013  0.992  0.018  0.991  0.018 
UFNO  0.997  0.005  0.994  0.009  0.994  0.014  0.993  0.015 
In addition to considering the average performance over the entire training and testing sets, we compare model predictions for four different cases with varying degrees of complexity in Figure 4. For each case, Figure 4 shows a comparison between the predicted and true values of the CO saturation for each grid cell in the model over the entire 30 year injection period. The UFNO has superior performance compared to the CNN for all of these examples as quantified by the higher value and narrower 95% prediction bands. Case b. and d. are especially obvious examples in which the UFNO successfully predicts the complicated horizontal saturation variations where the CNN ignores the heterogeneity and simply predicts more uniform saturation fields.
4.2 Pressure buildup
For pressure buildup, the UFNO also achieves the lowest relative error for both training and testing data sets. As shown in Figure 3B, the training and testing relative errors for the UFNO are consistently low throughout the training process. Generally, pressure buildup models are less prone to overfitting than gas saturation models since pressure buildup fields are less heterogeneous. By comparing the scores for the training and testing sets in Table 2, we can observe that all FNObased models produce nearly negligible overfitting.
The superior performance of the UFNO for pressure buildup predictions is also demonstrated for the four examples shown in Figure 5. In each case the UFNO has higher R values and narrower 95% prediction bands. Unlike the gas saturation outputs, pressure buildup distributions are challenging to predict since they have a larger radius of influence and larger differences between cases. For example, the maximum pressure buildup in the 4 examples shown in Figure 5 varies from 20 bar to 220 bar. Notice that the the CNN model especially struggles with cases that have large radius of influence (e.g. case d) while the UFNO model maintains excellent accuracy at locations that are far away from the injection well.
5 Discussion
5.1 Training size analysis
The results in Section 4 demonstrate the excellent generalization ability of all FNObased architectures. To further investigate the relationship between the training size and overfitting for the newly proposed UFNO model, we run a set of comparisons for the UFNO model trained with 500, 2,500 and 4,500 samples (see Figure 6). Each model is trained for the same number of epochs. From this comparison, we can observe that the gas saturation models are more prone to overfitting as indicated by the wider gaps in mean and standard deviations. For the pressure buildup, the training size of 4,500 produces a very similar score in the training and testing set. Therefore, a 4,500 training datasize is already sufficient for pressure buildup prediction and more training data will note significantly improve the performance.
5.2 Computational efficiency analysis
We summarize the computational efficiency of the CNN, FNO, ConvFNO, and UFNO in Table 3. The training and testing times are both evaluated on a Nvidia A100SXM GPU. For the comparison, we run ECLIPSE simulations on an Intel^{®} Xeon^{®} Processor E52670 CPU. Each simulation case can utilize a fully dedicated CPU and the average run time for 1,000 random cases is 10 minutes. We run the comparison with the CPU that’s readily available to us. Other types of CPU will not change the simulation time for a significant order.
# Parameter  Training  Testing  

Gas saturation  Pressure  Speedup vs. numerical  
()  (s/epoch)  (s)  buildup (s)  simulation (times)  
CNN  33,316,481  562  0.050  0.050  1 
FNO  31,117,541  711  0.005  0.005  1 
ConvFNO  31,222,625  1,135  0.006  0.006  1 
UFNO  33,097,829  1,872  0.010  0.010  6 
All of the neural network models are at least 10 times faster than conventional numerical simulation during prediction. Notice that FNObased models are significantly faster than the CNN model at testing time while slower at training time. In this problem, we prioritize the prediction accuracy and testing time over the training time, which UFNO provides. For problems that are more sensitive to training time, one could also use the ConvFNO which provides both high accuracy and relatively fast training.
5.3 Fourier kernel visualization
As described in Section 2, the Fourier path within each UFourier layer contains trainable kernel that is parameterized in the Fourier space. Here we provide visualizations for a random selection of the Fourier kernels in the trained gas saturation and pressure buildup models. Notice that unlike traditional CNN kernels that are generally small (e.g., or ), Fourier kernels are full field kernels that can be interpreted by any grid discretization. The kernels in this paper are 3D kernels with dimensions and the examples shown in Figure 7 are the directional slices evaluated using the data discretization. Both gas saturation and pressure buildup models contain a wide variety of kernels from low to high frequency. We hypothesize that the asymmetry in the direction might be related to the gradually coarsening directional grid resolution, while the asymmetry in the direction might be related to the effects of buoyancy since CO is less dense than water and tends to migrate to the top of the reservoir.
6 Conclusion
This paper presents UFNO, an enhanced Fourier neural operator for solving multiphase flow problems. We demonstrate that UFNO predicts highly accurate flow outputs for a complex COwater multiphase flow problem in the context of CO geological storage.
Through comparisons with the original FNO architecture li2020fourier and a stateoftheart CNN benchmark WEN2021104009, we show that the newly proposed UFNO architecture provides the best performance for both gas saturation and pressure buildup predictions. The UFNO architecture enhances the training accuracy of a original FNO. At the same time, UFNO maintains the excellent generalizability of the original FNO architecture. For the COwater multiphase flow application described here, our goal is to optimize for the accuracy of gas saturation and pressure fields, for which the UFNO provides the highest performance.
The trained UFNO model generates gas saturation and pressure buildup prediction that are times faster than a traditional numerical solver. The significant improvement in the computational efficiency can support many engineering tasks that requires repetitive forward numerical simulations. For example, the trained UFNO model can serve as an alternative to full physics numerical simulators in probabilistic assessment, inversion, and site selection, tasks that were prohibitively expensive with desirable grid resolution using numerical simulation.
Code and data availability
The python code for UFNO model architecture and the data set used in training will be released upon the publication of this manuscript. A web app hosting the pretrained models for gas saturation and pressure buildup will be released upon the publication of this manuscript.
Acknowledgments
G. Wen and S. M. Benson gratefully acknowledges the supported by ExxonMobil through the Strategic Energy Alliance at Stanford University and the Stanford Center for Carbon Storage. Z. Li gratefully acknowledges the financial support from the Kortschak Scholars Program. A. Anandkumar is supported in part by Bren endowed chair, LwLL grants, Beyond Limits, Raytheon, Microsoft, Google, Adobe faculty fellowships, and DE Logi grant. The authors would like to acknowledge Peter Messmer for his comments and pointing out the typos.
Appendix A Table of notations
Notation  Meaning  
Operator learning  The spatial domain for the problem  
Input coefficient functions  
Target solution functions  
The operator mapping from coefficients to solutions  
n  The size of the discretization  
x  Points in the spatial domain  
The discretization of  
An approximation of  
A probability measure where is sampled from  
Cost function  
UFNO  The discretized data input  
The discretized data output  
High dimensional representation of in Fourier layers  
High dimensional representation of in UFourier layers  
The lifting neural network  
The projection neural network  
UFourier layer  The Kernel integral operator applied on and  
The linear transformation applied on the lower Fourier modes 

The linear transformation (bias term) applied on the spatial domain  
The UNet operator applied on and  
The activation function  
Fourier transformation and its inverse  
The kernel function learned from data  
The maximum number of modes  
The number of channels  
Governing equation  Components of CO and water  
Phases of wetting and nonwetting  
The pore volume  
Time  
The saturation of phase  
The density of phase  
The mass fraction of phase  
Flux  
The source term  
The pressure of phase  
The absolute permeability  
The relative permeability of phase  
The viscosity of phase  
Gravitational acceleration  
Sampling variable  refer to Table 1 
Appendix B Grid discretization
Dimension  Parameter  Notation  Value  Unit 

Vertical ()  box boundary  12 to 200  m  
grid count  6 to 96    
grid thickness  2.08  m  
Radial ()  box boundary  1,000,000  m  
grid count  200    
minimum grid width  3.6  m  
amplification factor  1.035012    
well radius  0.1  m  
Temporal ()  total length  30  years  
step count  24    
minimum step  1  day  
amplification factor  1.421245   
Appendix C Heterogeneous permeability map statistical parameters and visualizations
Medium  Parameter  Mean  Std  Max  Min  Unit 
A. Gaussian  Field average  30.8  58.3  1053  0.3  mD 
Vertical correlation  7.3  3.6  12.5  2.1  m  
Horizontal correlation  2190  1432  6250  208  m  
Contrast ratio  4.01 10  2.19 10  3.00 10  1.01    
B. von Karman  Field average  39.9  54.4  867.9  1.8  mD 
carpentier2009conservation  Vertical correlation  7.2  3.5  12.5  2.1  m 
Horizontal correlation  2.15 10  1.40 10  6.23 10  208  m  
Contrast ratio  2.66 10  1.54 10  2.12 10  1.00    
C. Discontinuous  Field average  80.8  260.2  5281  2.0  mD 
Vertical correlation  7.2  3.6  12.5  2.1  m  
Horizontal correlation  2176  1429  6250  208  m  
Contrast ratio  2.17 10  1.51 10  2.68 10  1.01    
D. Homogeneous  Field permeability  327.7  478.1  1216  4.0  mD 
Appendix D CNN benchmark model architecture
Part  Layer  Output Shape 

Input    (96,200,24,1) 
Encode 1  Conv3D/BN/ReLu 
(48,100,12,32) 
Encode 2  Conv3D/BN/ReLu  (48,100,12,64) 
Encode 3  Conv3D/BN/ReLu  (24,50,6,128) 
Encode 4  Conv3D/BN/ReLu  (24,50,6,128) 
Encode 5  Conv3D/BN/ReLu  (12,25,3,256) 
Encode 6  Conv3D/BN/ReLu  (12,25,3,256) 
ResConv 1  Conv3D/BN/Conv3D/BN/ReLu/Add  (12,25,3,256) 
ResConv 2  Conv3D/BN/Conv3D/BN/ReLu/Add  (12,25,3,256) 
ResConv 3  Conv3D/BN/Conv3D/BN/ReLu/Add  (12,25,3,256) 
ResConv 4  Conv3D/BN/Conv3D/BN/ReLu/Add  (12,25,3,256) 
ResConv 5  Conv3D/BN/Conv3D/BN/ReLu/Add  (12,25,3,256) 
Decode 6  UnSampling/Padding/Conv3D/BN/Relu  (12,25,3,256) 
Decode 5  UnSampling/Padding/Conv3D/BN/Relu  (24,50,6,256) 
Decode 4  UnSampling/Padding/Conv3D/BN/Relu  (24,50,6,128) 
Decode 3  UnSampling/Padding/Conv3D/BN/Relu  (48,100,12,128) 
Decode 2  UnSampling/Padding/Conv3D/BN/Relu  (48,100,12,64) 
Decode 1  UnSampling/Padding/Conv3D/BN/Relu  (96,200,24,32) 
Output  Conv3D  (96,200,24,1) 
denotes a batch normalization layer;
ReLu denotes a rectified linear layer; Add denotes an addition with the identity; UnSampling denotes an unSampling layer that expands the matrix dimension using nearest neighbor method, and Padding denotes a padding layer using the reflection padding technique. In this model, the number of total parameters is 33,316,481 with 33,305,857 trainable parameters and 10,624 nontrainable parameters. To ensure a fair comparison with the FNObased models, we performed hyperparameter optimization on the CNN benchmark model and trained it with the same loss function (Equation 12) as the FNObased models.Appendix E FNO model architecture
Part  Layer  Output Shape 

Input    (96,200,24,12) 
Padding  Padding  (104, 208, 32, 12) 
Lifting  Linear  (104, 208, 32, 36) 
Fourier 1  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 2  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 3  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 4  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 5  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 6  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Projection 1  Linear  (104, 208, 32, 128) 
Projection 2  Linear  (104, 208, 32, 1) 
Depadding    (96, 200, 24, 1) 
Appendix F ConvFNO model architecture
Part  Layer  Output Shape 

Input    (96,200,24,12) 
Padding  Padding  (104, 208, 32, 12) 
Lifting  Linear  (104, 208, 32, 36) 
Fourier 1  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 2  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 3  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
ConvFourier 1  Fourier3d/Conv1d/Conv3d/Add/ReLu  (104, 208, 32, 36) 
ConvFourier 2  Fourier3d/Conv1d/Conv3d/Add/ReLu  (104, 208, 32, 36) 
ConvFourier 3  Fourier3d/Conv1d/Conv3d/Add/ReLu  (104, 208, 32, 36) 
Projection 1  Linear  (104, 208, 32, 128) 
Projection 2  Linear  (104, 208, 32, 1) 
Depadding    (96, 200, 24, 1) 
Appendix G UFNO model architecture
Part  Layer  Output Shape 

Input    (96,200,24,12) 
Padding  Padding  (104, 208, 32, 12) 
Lifting  Linear  (104, 208, 32, 36) 
Fourier 1  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 2  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
Fourier 3  Fourier3d/Conv1d/Add/ReLu  (104, 208, 32, 36) 
UFourier 1  Fourier3d/Conv1d/UNet3d/Add/ReLu  (104, 208, 32, 36) 
UFourier 2  Fourier3d/Conv1d/UNet3d/Add/ReLu  (104, 208, 32, 36) 
UFourier 3  Fourier3d/Conv1d/UNet3d/Add/ReLu  (104, 208, 32, 36) 
Projection 1  Linear  (104, 208, 32, 128) 
Projection 2  Linear  (104, 208, 32, 1) 
Depadding    (96, 200, 24, 1) 
Comments
There are no comments yet.