U-FNO – an enhanced Fourier neural operator based-deep learning model for multiphase flow

by   Gege Wen, et al.

Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. However, due to the multi-physics, non-linear, and multi-scale problem nature, these simulations are very expensive at desirable grid resolutions, and the computational cost often impedes rigorous engineering decision-making. Machine learning methods provide faster alternatives to traditional simulators by training neural network models with numerical simulation data mappings. Traditional convolutional neural network (CNN)-based models are accurate yet data-intensive and are prone to overfitting. Here we present a new architecture, U-FNO, an enhanced Fourier neural operator for solving the multiphase flow problem. The U-FNO is designed based on the Fourier neural operator (FNO) that learns an integral kernel in the Fourier space. Through a systematic comparison among a CNN benchmark and three types of FNO variations on a CO2-water multiphase problem in the context of CO2 geological storage, we show that the U-FNO architecture has the advantages of both traditional CNN and original FNO, providing significantly more accurate and efficient performance than previous architectures. The trained U-FNO provides gas saturation and pressure buildup predictions with a 10,000 times speedup compared to traditional numerical simulators while maintaining similar accuracy.



There are no comments yet.


page 9

page 14

page 15

page 18


Fourier Neural Operator for Parametric Partial Differential Equations

The classical development of neural networks has primarily focused on le...

CCSNet: a deep learning modeling suite for CO_2 storage

Numerical simulation is an essential tool for many applications involvin...

Residual-based adaptivity for two-phase flow simulation in porous media using Physics-informed Neural Networks

This paper aims to provide a machine learning framework to simulate two-...

Performance and accuracy assessments of an incompressible fluid solver coupled with a deep Convolutional Neural Network

The resolution of the Poisson equation is usually one of the most comput...

Factorized Fourier Neural Operators

The Fourier Neural Operator (FNO) is a learning-based method for efficie...

Fourier Neural Operator Networks: A Fast and General Solver for the Photoacoustic Wave Equation

Simulation tools for photoacoustic wave propagation have played a key ro...

Surrogate Model for Shallow Water Equations Solvers with Deep Learning

Shallow water equations are the foundation of most models for flooding a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multiphase flow in porous media is important for many geoscience applications, including contaminant transport bear2010modeling, carbon capture and storage (CCS) pachauri2014climate, hydrogen storage hashemi2021pore, oil and gas extraction aziz1979petroleum, and nuclear waste storage amaziane2012numerical. Due to the multi-physics, non-linear, and multi-scale nature of these processes, numerical simulation is the primary approach used to solve mass and energy conservation equations for these applications orr2007theory. These numerical simulations are often very time consuming and computationally intensive since they require fine spatial and temporal discretization to accurately capture the flow processes Doughty2010; Wen2019. Meanwhile, the inherent uncertainty in property distributions of heterogeneous porous medium necessitates probabilistic assessments and inverse modeling to aid engineering decisions Kitanidis2015; Strandli2014. Both of these procedures require large numbers of forward numerical simulation runs and are often prohibitively expensive NationalAcademiesofSciencesEngineering2018.

A number of machine learning-based methods have been proposed over the past few years to provide faster alternatives to numerical simulation. Most existing machine learning-based methods can be categorized into the following two categories: (1) data-driven finite-dimensional operators that learn Euclidean space mappings from numerical simulation data zhu2018bayesian; mo2019deep; Zhong2019; tang2020deep; WEN2021103223; WEN2021104009, and (2) physics-informed/ physics-constrained/neural finite difference learning methods that parameterize the solution functions with a neural network raissi2019physics; Zhu2019; haghighat2021sciann. The first type, finite-dimensional operators, is often implemented with convolutional neural networks (CNN). These CNN-based models have been successful in providing fast and accurate predictions for high-dimensional and complex multiphase flow problems WEN2021103223; jiang2021deep; WEN2021104009; tang2021deep; wu2020physics. However, CNN-based methods are prone to overfitting, therefore requiring large numerical simulation data sets that can be unmanageable as the problem dimension grows. Also, the results produced by these models are tied to the specific spatial and temporal meshes used in the numerical simulation data set. The second approach using neural finite difference methods requires separate trainings for any new instance of the parameters or coefficients raissi2019physics (e.g., new permeability map or injection rate). Therefore, these methods require as much computational effort as traditional numerical solvers, if not more. Furthermore, these methods often struggle with multiphase flow problems with shock fronts, which are common for many classes of geoscience problems fuks2020physics; almajid2021prediction.

Recently, a novel approach, the neural operator, has been proposed that directly learns the infinite-dimensional-mapping from any functional parametric dependence to the solution lu2019deeponet; li2020multipole; li2020neural; bhattacharya2020model. Unlike neural finite difference methods, neural operators are data-driven therefore require training only once. Meanwhile, neural operators are mesh-independent, so they can be trained and evaluated on different grids. Due to the cost of evaluating global neural integral operators, previously proposed neural operators have not yet been able to achieve the desirable degree of computational efficiency li2020fourier

. However, one type of neural operator, the Fourier neural operator (FNO), alleviates this issue through the implementation of a Fast Fourier Transform 

li2020fourier. The FNO has shown excellent performance on single-phase flow problems with great generalization ability, and is significantly more data efficient than CNN-based methods li2020fourier.

Here we extend the FNO-based architecture to multiphase flow problems. We find that while FNO’s testing accuracy is generally higher than CNN-based models, the training accuracy is sometimes lower due to the regularization effect of the FNO architecture. To improve upon this, we present an enhanced Fourier neural operator, named U-FNO, that combines the advantages of FNO-based and CNN-based models to provide results that are both highly accurate and data efficient. Through the implementation of the newly proposed U-Fourier layer, we show that the U-FNO model architecture produces superior performance over both the original FNO li2020fourier and a state-of-the-art CNN benchmark WEN2021104009. We apply the U-FNO architecture to the highly complex CO-and-water multiphase flow problem in the context of CO geological storage to predict dynamic pressure buildup and gas saturation. The trained U-FNO models provide an alternative to numerical simulation for 2D-radial CO injection problems with wide ranges of permeability and porosity heterogeneity, anisotropy, reservoir conditions, injection configurations, flow rates, and multiphase flow properties.

2 Methods

The goal of a neural operator is to learn an infinite-dimensional-space mapping from a finite collection of input-output observations. To formulate the problem, we define the domain be a bounded and open set; be the input function space; be the output function space. and are separable Banach spaces of functions defined on that takes values in and respectively. is a non-linear map that satisfy the governing PDEs. Suppose we have

that are drawn from probability measure

in , then . We aim to build an operator that learns an approximation of by minimizing the following problem using a cost function .


Since and are both functions, we use -point discretization to numerically represent and . We demonstrate in this section that the proposed U-FNO architecture learns the infinite-dimensional-space mapping from a finite collections of and pairs. A table of notation is included in A.

2.1 U-FNO architecture

The U-FNO architecture contains the following three steps:

  1. Lift input observation to a higher dimensional space through a fully connected neural network transformation .

  2. Apply iterative Fourier layers followed by iterative U-Fourier layers: where for and for are sequences of functions taking values in for channel dimension .

  3. Project back to the original space using a fully connected neural network transformation .

Figure 1: A. U-FNO model architecture. is the input, and are fully connected neural networks, and is the output. B. Inside the Fourier layer, denotes the Fourier transform, is the parameterization in Fourier space, is the inverse Fourier transform, is a linear bias term, and

is the activation function. C. Inside the U-FNO layer,

denotes a two step U-Net, the other notations have identical meaning as in the Fourier layer.

Figure 1A provides a schematic of the U-FNO architecture. Within each newly proposed U-Fourier layer, we have


where is a kernel integral transformation parameterized by a neural network, is a U-Net CNN operator, and is a linear operator, which are all learnable. is a non-linear activation function. Refer to Li et al. li2020fourier for the formulation of the original Fourier layer.

2.2 Integral kernel operator in the Fourier space

The integral kernel operator in Equation 2 is defined by


To efficiently parameterize kernel , the FNO method considers the representation (and also ) in the Fourier space and utilizes Fast Fourier Transform (FFT) li2020fourier. By letting in Equation 3 and applying the convolution theorem, we can obtain


where denotes a Fourier transform of a function and is its inverse. Now, we can parameterize directly by its Fourier coefficients:


where is the Fourier transform of a periodic function . Since we assume that is periodic, we can apply a Fourier series expansion and work in the discrete modes of Fourier transform.

We first truncate the Fourier series at a maximum number of modes , and then parameterize directly as a complex valued (

)-tensor with the truncated Fourier coefficients. As a result, multiplication by the learnable weight tensor



By replacing the by the FFT and implementing using a direct linear parameterization, we have obtained the Fourier operator as illustrated in Figure 1B and C with nearly linear complexity.

2.3 Characteristics of the U-Fourier layer

In contrast to the original Fourier layer in FNO li2020fourier, the U-FNO architecture proposed here appends a U-Net path in each U-Fourier layer. The U-Net processes local convolution to enrich the representation power of the U-FNO in higher frequencies information. The number of Fourier and U-Fourier layers, and

, are hyperparameters that can be optimized for the specific problem. For the multi-phase flow problem considered here, we found that the architecture with half Fourier layers and half U-Fourier layers achieves the best performance, compared to architectures with all Fourier layers or all U-Fourier layers.

Note that although the Fourier neural operator is an infinite-dimensional-operator, when we append the U-Net block, we sacrifice the flexibility of training and testing the model at different discretizations. We made this choice because the CO-water multiphase flow problem is very sensitive to numerical dispersion and numerical dissolution, which are both tied to a specific grid resolution. When training and testing at different grid dimensions, the numerical noise is often transformed in a nonphysical way. As a result, for this problem, we prioritize achieving higher training and testing accuracy, which the U-FNO provides.

3 Problem setting

3.1 Governing equation

We consider a multi-phase flow problem with CO and water in the context of geological storage of CO. The CO and water are immiscible but have mutual solubility. The general forms of mass accumulations for component or are written as Pruess1999:


Here denotes the phase of (wetting) or (non-wetting). In the siliciclastic rocks present at most geological storage sites, water is the wetting phase pini2012capillary. However, due to the mutual solubility of water and CO, there is a small amount of CO in the water phase and a small amount of water in the CO phase. Here is the porosity, is the saturation of phase , and is the mass fraction of component in phase .

For both components, the advective mass flux is obtained by summing over phases ,


where each individual phase flux is governed by the multiphase flow extension of Darcy’s law. denotes the absolute permeability, is the relative permeability of phase that non-linearly depends on , is the viscosity of phase that depends on , and is the gravitational acceleration.

Due to the effect of capillarity, the fluid pressure of each phase is


where the capillary pressure is a non-linear function of . Additionally, porosity , density , and the solubility of in Equation 7 and Equation 8 are also non-linear functions that depend on .

To simplify the problem setting, our simulation does not explicitly include molecular diffusion and hydrodynamic dispersion. However some unavoidable numerical dispersion resulting from approximating spatial gradients using the two-point upstream algorithm eclipse is intrinsic to the numerical simulations used for the neural network training.

3.2 Numerical simulation setting

We use the numerical simulator ECLIPSE (e300) to develop the multiphase flow data set for CO geological storage. ECLIPSE is a full physics simulator that uses the finite difference method with upstream weighting for spatial discretization and the adaptive IMplicit method for temporal discretization eclipse. We inject super-critical CO at a constant rate into a radially symmetrical system through a vertical injection well with a radius of 0.1 m. The well can be perforated over the entire thickness of the reservoir or limited to a selected depth interval. We simulate CO injection for 30 years at a constant rate ranging from 0.2 to 2 Mt/year. The thickness of the reservoir ranges from 12 to 200 m with no-flow boundaries on the top and bottom. We use a vertical cell dimension of 2.08 m to capture the vertical heterogeneity of the reservoir. The radius of the reservoir is 100,000 m. The outer boundary is closed, but is sufficiently distant from the injection well that it behaves like an infinite acting reservoir. Two hundred gradually coarsened grid cells are used in the radial direction. Grid sensitivity studies show that this grid is sufficiently refined to capture the CO plume migration and pressure buildup, while remaining computationally tractable Wen2019. Simulated values of the gas saturation () and pressure buildup () fields at 24 gradually coarsening time snapshots are used for training the neural nets. Refer to B for detailed spatial and temporal discretizations.

Figure 2: Example of mapping between A. input to B. output gas saturation and C. pressure buildup. A. Field and scalar channels for each case. Note that the scalar variables are broadcasted into a channel at the same dimension as the field channels. B. Gas saturation evolution for 6 out of 24 time snapshots. C. Pressure buildup evolution for 6 out of 24 time snapshots.

3.3 Variable sampling scheme

We sample two types of variables for each numerical simulation case: field variables and scalar variables. As shown in Figure 2, field variables include the horizontal permeability map (), vertical permeability map (), porosity map (), and injection perforation map (). The reservoir thickness is randomly sampled in each simulation case and controls the reservoir dimension in each of the following field variables:

variable type sampling parameter notation distribution unit
field horizontal permeability field heterogeneous* -
# of anisotropic materials -
material anisotropy ratio -
porosity (perturbation) -
reservoir thickness m
perforation thickness m
perforation location - randomly placed -
scalar injection rate MT/y
initial pressure bar
iso-thermal reservior temperature C
irreducible water saturation -
van Genuchten scaling factor -
Table 1: Summary of input variable’s type, sampling range, distribution, and unit. All input sampling are independent with the exception of porosity and vertical permeability map. *: refer to C for a detailed statistical parameter summary for generating heterogeneous map.
  • : The Stanford Geostatistical Modeling Software (SGeMS) sgems is used to generate the heterogeneous maps. SGeMS produces permeability map according to required input parameters such as correlation lengths in the vertical and radial directions, medium appearances (C

    ), as well as permeability mean and standard deviation. A wide variety of permeability maps representing different depositional environments are included in the data set and the permeability value ranges widely from 10 Darcy to 0.001 mD.

    C summarizes statistical parameters that characterize the permeability maps.

  • : The vertical permeability map is calculated by multiplying the map by the anisotropy map. To generate the anisotropy map, values of are binned into materials, each of which is assigned a randomly sampled anisotropy ratio. Note that the anisotropy ratio is uncorrelated with the magnitude of the radial permeability. This procedure roughly mimics a facies-based approach for assigning anisotropy values.

  • : Previous studies show that porosity and permeability are loosely correlated with each other pape2000variation. Therefore, to calculate porosity we first use the fitting relationship presented in Pape et al pape2000variation and then perturb these values with a random Gaussian noise with mean value of zero and standard deviation of 0.005.

  • : The injection interval thickness is randomly sampled within the range from 12 m to the specific reservoir thickness of that case. We placed the perforation interval on the injection well, by randomly sampling the depth of the perforation top from 0 m to m.

Visualizations of the above field variable are shown in C. Table 1 summarizes the parameter sampling ranges and distributions. The sampling parameters are independent of each other with the exception of porosity and permeability.

Scalar variables include the initial reservoir pressure at the top of the reservoir (), reservoir temperature (), injection rate (), capillary pressure scaling factor (li2013influence, and irreducible water saturation (). The parameter sampling range and distributions are summarized in Table 1. While the scalar variables and and determined independently, cases that yield unrealistic combinations of these variables are excluded. These field and scalar input variables create a very high-dimensional input space, which is very challenging for traditional CNN-based models. Nevertheless, the U-FNO model architecture handles the high-dimensional input space with excellent data efficiency.

3.4 Data configuration

Each of the field variables in Figure 2A is represented by a channel in the data input. Since we use a gradually coarsening radial grid for the numerical simulations, a logarithm conversion in the radial direction is applied in training to project the field variables onto a uniform grid that can be represented by a

matrix. Notice that reservoir thickness is also a variable and 96 cells represents a 200m thick reservoir. When the reservoir is thinner than 200 m, we use zero-padding to denote cells that are outside of the actual reservoir. For the scalar variables, the values are simply broadcast into a matrix with dimension of


In addition to the input variables, we also supply the spatial grid information to the training by using one channel to denote radial cell dimensions and another channel to denote vertical cell dimensions. The temporal grid information is supplied into the network as an additional dimension. The input to each data sample is constructed by concatenating the field variables, scalar variables, spatial grids, and temporal grid together.

For the gas saturation and pressure buildup outputs as shown in Figure 2

B and C, we use the same logarithm conversion to project the outputs onto a uniform grid. We then concatenate the outputs for different time snapshots to obtain a spatial-temporal 3D volume. The pressure buildup is normalized into zero-mean and unit-variance distribution. For gas saturation, we do not normalize the data because the saturation values always range from 0 to 1. The dimensions of the input and outputs are shown for in each model architecture (Appendices D to G).

The data set contains 5,000 input-to-output mappings. We use a 9/1 split to segregate the data set into 4,500 samples for training and 500 samples for testing.

3.5 Loss function design and training

We use a relative -loss to train the deep learning models. The -loss is applied to both the original output () and the first derivative of the output in the -direction (), and is written as:


where is the predicted output, is the first derivative of the predicted output, is the order of norm, and is a hyper-parameter. This relative loss has a regularization effect and is particularly effective when the data have large variances on the norms. Our experiments show that, compared to an -loss, a relative loss significantly improves the performance for both gas saturation and pressure buildup. The second term in Equation 12 greatly improves quality of predictions for gas saturation at the leading edge of the plume. Similarly this term improves prediction of the sharp pressure buildup around the injection well. We use the -loss for gas saturation and pressure buildup since it provides faster convergence than the -loss.

As described in Section 3, our data set contains reservoirs with various thicknesses and the cells outside of the reservoir are padded with zeros for both input and output. To accommodate for the variable reservoir thicknesses, during training, we construct an active cell mask for each data sample and only calculate the loss within the mask. Our experiments show that this loss calculation scheme achieves better performance than calculating the whole field because of the better gradient distribution efficiency.

During training, the initial learning rate is 0.001 and the learning rate gradually decreases with a constant step and reduction rate. These hyper-parameters are optimized for the gas saturation and pressure buildup model separately. The training stops when the loss no longer decreases.

4 Results

This section compares 4 types of model architectures: original FNO proposed in Li et al. li2020fourier, the newly proposed U-FNO in this paper, a conv-FNO that uses a conv3d in the place of the U-Net, and the state-of-the-art benchmark CNN used in Wen et al. WEN2021104009

. All models are trained on the proposed loss function (Equation 

12) and directly output the 3D gas saturation and pressure field in space and time. Detailed parameters for each model are summarized in Appendices D to G.

4.1 Gas saturation

Figure 3:

Training and testing relative loss evolution vs. epoch for U-FNO, FNO, conv-FNO and CNN benchmark for A. gas saturation and B. pressure buildup.

Figure 3A and Table 2 demonstrates that the best performance for both the training and testing data set is achieved with the U-FNO model. Specifically, the testing set performance represents the true predictability of the model for unseen data. The low relative loss clearly indicates the superior performance of the proposed U-FNO.

Interestingly, we notice that although the original FNO has a higher training relative loss than the CNN benchmark, the testing relative loss by the original FNO is lower than that of the CNN benchmark. This indicates that FNO has excellent generalization ability and achieves better performance than the CNN even though FNO has a higher training relative loss. Nevertheless, the original FNO has the highest relative loss in the training set due to the inherent regularization effect by using a finite set of truncated Fourier basis. The Conv-FNO and U-FNO architecture is therefore designed to enhance the expressiveness by processing the higher frequency information that are not picked up by the Fourier basis. We can observe from Figure 3A that the training loss is significantly improved even by simply adding a plain conv3d in the Conv-FNO case. When the FNO layer is combined with a U-Net in the U-FNO case, the model takes the advantages of both architectures and consistently produces the lowest relative loss throughout the entire training (Figure 3A).

Model Gas saturation () Pressure buildup ()
Train Test Train Test
mean std mean std mean std mean std
CNN 0.994 0.007 0.986 0.018 0.990 0.029 0.988 0.023
FNO 0.990 0.013 0.986 0.017 0.991 0.014 0.990 0.017
Conv-FNO 0.993 0.011 0.990 0.013 0.992 0.018 0.991 0.018
U-FNO 0.997 0.005 0.994 0.009 0.994 0.014 0.993 0.015
Table 2: Average training and testing data set R scores of 500 random samples for 4 models. scores measures the similarity between the prediction and the numerical simulation output; an of 1 indicates an exact match.
Figure 4: Visualizations and scatter plots for example a to d. In each example, visualizations show the true gas saturation (), U-FNO predicted, U-FNO absolute error, CNN predicted, and CNN absolute error. The mean absolute error is labeled on the U-FNO and CNN absolute error plots. Scatter plots shows numerical simulation vs. predicted by U-FNO and CNN model on each grid. The legend for all of the scatter plots is shown in the bottom right.

In addition to considering the average performance over the entire training and testing sets, we compare model predictions for four different cases with varying degrees of complexity in Figure 4. For each case, Figure 4 shows a comparison between the predicted and true values of the CO saturation for each grid cell in the model over the entire 30 year injection period. The U-FNO has superior performance compared to the CNN for all of these examples as quantified by the higher value and narrower 95% prediction bands. Case b. and d. are especially obvious examples in which the U-FNO successfully predicts the complicated horizontal saturation variations where the CNN ignores the heterogeneity and simply predicts more uniform saturation fields.

4.2 Pressure buildup

Figure 5: Visualizations and scatter plots for examples a to d. In each example, visualizations show the true pressure buildup (), U-FNO predicted, U-FNO relative error, CNN predicted, and CNN relative errors. The relative errors are defined as in tang2021deep; the mean relative error is labeled on the U-FNO and CNN relative error plots. Scatter plots shows numerical simulation vs. predicted by U-FNO and CNN model on each grid. The legend for all of the scatter plots is shown in the bottom right.

For pressure buildup, the U-FNO also achieves the lowest relative error for both training and testing data sets. As shown in Figure 3B, the training and testing relative errors for the U-FNO are consistently low throughout the training process. Generally, pressure buildup models are less prone to overfitting than gas saturation models since pressure buildup fields are less heterogeneous. By comparing the scores for the training and testing sets in Table 2, we can observe that all FNO-based models produce nearly negligible overfitting.

The superior performance of the U-FNO for pressure buildup predictions is also demonstrated for the four examples shown in Figure 5. In each case the U-FNO has higher R values and narrower 95% prediction bands. Unlike the gas saturation outputs, pressure buildup distributions are challenging to predict since they have a larger radius of influence and larger differences between cases. For example, the maximum pressure buildup in the 4 examples shown in Figure 5 varies from 20 bar to 220 bar. Notice that the the CNN model especially struggles with cases that have large radius of influence (e.g. case d) while the U-FNO model maintains excellent accuracy at locations that are far away from the injection well.

5 Discussion

5.1 Training size analysis

The results in Section 4 demonstrate the excellent generalization ability of all FNO-based architectures. To further investigate the relationship between the training size and overfitting for the newly proposed U-FNO model, we run a set of comparisons for the U-FNO model trained with 500, 2,500 and 4,500 samples (see Figure 6). Each model is trained for the same number of epochs. From this comparison, we can observe that the gas saturation models are more prone to overfitting as indicated by the wider gaps in mean and standard deviations. For the pressure buildup, the training size of 4,500 produces a very similar score in the training and testing set. Therefore, a 4,500 training data-size is already sufficient for pressure buildup prediction and more training data will note significantly improve the performance.

Figure 6: The training and testing set mean and standard deviation vs. the number of training sample for A. Gas saturation and B. pressure buildup.

5.2 Computational efficiency analysis

We summarize the computational efficiency of the CNN, FNO, Conv-FNO, and U-FNO in Table 3. The training and testing times are both evaluated on a Nvidia A100-SXM GPU. For the comparison, we run ECLIPSE simulations on an Intel® Xeon® Processor E5-2670 CPU. Each simulation case can utilize a fully dedicated CPU and the average run time for 1,000 random cases is 10 minutes. We run the comparison with the CPU that’s readily available to us. Other types of CPU will not change the simulation time for a significant order.

# Parameter Training Testing
Gas saturation Pressure Speed-up vs. numerical
(-) (s/epoch) (s) buildup (s) simulation (times)
CNN 33,316,481 562 0.050 0.050 1
FNO 31,117,541 711 0.005 0.005 1
Conv-FNO 31,222,625 1,135 0.006 0.006 1
U-FNO 33,097,829 1,872 0.010 0.010 6
Table 3: Summary of the number of parameters, training time, and testing times required for all four models. The testing times are calculated by taking the average of 500 random cases. The speed-up is compared with average numerical simulation run time of 10 mins.

All of the neural network models are at least 10 times faster than conventional numerical simulation during prediction. Notice that FNO-based models are significantly faster than the CNN model at testing time while slower at training time. In this problem, we prioritize the prediction accuracy and testing time over the training time, which U-FNO provides. For problems that are more sensitive to training time, one could also use the Conv-FNO which provides both high accuracy and relatively fast training.

5.3 Fourier kernel visualization

Figure 7: Visualizations of random selections of directional kernels for trained A. gas saturation and B. pressure buildup models.

As described in Section 2, the Fourier path within each U-Fourier layer contains trainable kernel that is parameterized in the Fourier space. Here we provide visualizations for a random selection of the Fourier kernels in the trained gas saturation and pressure buildup models. Notice that unlike traditional CNN kernels that are generally small (e.g., or ), Fourier kernels are full field kernels that can be interpreted by any grid discretization. The kernels in this paper are 3D kernels with dimensions and the examples shown in Figure 7 are the directional slices evaluated using the data discretization. Both gas saturation and pressure buildup models contain a wide variety of kernels from low to high frequency. We hypothesize that the asymmetry in the direction might be related to the gradually coarsening -directional grid resolution, while the asymmetry in the direction might be related to the effects of buoyancy since CO is less dense than water and tends to migrate to the top of the reservoir.

6 Conclusion

This paper presents U-FNO, an enhanced Fourier neural operator for solving multiphase flow problems. We demonstrate that U-FNO predicts highly accurate flow outputs for a complex CO-water multiphase flow problem in the context of CO geological storage.

Through comparisons with the original FNO architecture li2020fourier and a state-of-the-art CNN benchmark WEN2021104009, we show that the newly proposed U-FNO architecture provides the best performance for both gas saturation and pressure buildup predictions. The U-FNO architecture enhances the training accuracy of a original FNO. At the same time, U-FNO maintains the excellent generalizability of the original FNO architecture. For the CO-water multiphase flow application described here, our goal is to optimize for the accuracy of gas saturation and pressure fields, for which the U-FNO provides the highest performance.

The trained U-FNO model generates gas saturation and pressure buildup prediction that are times faster than a traditional numerical solver. The significant improvement in the computational efficiency can support many engineering tasks that requires repetitive forward numerical simulations. For example, the trained U-FNO model can serve as an alternative to full physics numerical simulators in probabilistic assessment, inversion, and site selection, tasks that were prohibitively expensive with desirable grid resolution using numerical simulation.

Code and data availability

The python code for U-FNO model architecture and the data set used in training will be released upon the publication of this manuscript. A web app hosting the pre-trained models for gas saturation and pressure buildup will be released upon the publication of this manuscript.


G. Wen and S. M. Benson gratefully acknowledges the supported by ExxonMobil through the Strategic Energy Alliance at Stanford University and the Stanford Center for Carbon Storage. Z. Li gratefully acknowledges the financial support from the Kortschak Scholars Program. A. Anandkumar is supported in part by Bren endowed chair, LwLL grants, Beyond Limits, Raytheon, Microsoft, Google, Adobe faculty fellowships, and DE Logi grant. The authors would like to acknowledge Peter Messmer for his comments and pointing out the typos.

Appendix A Table of notations

Notation Meaning
Operator learning The spatial domain for the problem
Input coefficient functions
Target solution functions
The operator mapping from coefficients to solutions
n The size of the discretization
x Points in the spatial domain
The discretization of
An approximation of
A probability measure where is sampled from
Cost function
U-FNO The discretized data input
The discretized data output
High dimensional representation of in Fourier layers
High dimensional representation of in U-Fourier layers
The lifting neural network
The projection neural network
U-Fourier layer The Kernel integral operator applied on and

The linear transformation applied on the lower Fourier modes

The linear transformation (bias term) applied on the spatial domain
The U-Net operator applied on and
The activation function
Fourier transformation and its inverse
The kernel function learned from data
The maximum number of modes
The number of channels
Governing equation Components of CO and water
Phases of wetting and non-wetting
The pore volume
The saturation of phase
The density of phase
The mass fraction of phase
The source term
The pressure of phase
The absolute permeability
The relative permeability of phase
The viscosity of phase
Gravitational acceleration
Sampling variable refer to Table 1
Table 4: Table of notations.

Appendix B Grid discretization

Dimension Parameter Notation Value Unit
Vertical () box boundary 12 to 200 m
grid count 6 to 96 -
grid thickness 2.08 m
Radial () box boundary 1,000,000 m
grid count 200 -
minimum grid width 3.6 m
amplification factor 1.035012 -
well radius 0.1 m
Temporal () total length 30 years
step count 24 -
minimum step 1 day
amplification factor 1.421245 -
Table 5: Vertical, radial, and temporal grid discretization for ECLIPSE numerical simulation runs. The radial grid width gradually coarsens as , for . The temporal step size gradually coarsens as , for .

Appendix C Heterogeneous permeability map statistical parameters and visualizations

Figure 8: Horizontal permeability map, anisotropy map, and porosity map for A. Gaussian, B. von Karman, C. Discontinuous, and D. Homogeneous medium appearances.
Medium Parameter Mean Std Max Min Unit
A. Gaussian Field average 30.8 58.3 1053 0.3 mD
Vertical correlation 7.3 3.6 12.5 2.1 m
Horizontal correlation 2190 1432 6250 208 m
Contrast ratio 4.01 10 2.19 10 3.00 10 1.01 -
B. von Karman Field average 39.9 54.4 867.9 1.8 mD
carpentier2009conservation Vertical correlation 7.2 3.5 12.5 2.1 m
Horizontal correlation 2.15 10 1.40 10 6.23 10 208 m
Contrast ratio 2.66 10 1.54 10 2.12 10 1.00 -
C. Discontinuous Field average 80.8 260.2 5281 2.0 mD
Vertical correlation 7.2 3.6 12.5 2.1 m
Horizontal correlation 2176 1429 6250 208 m
Contrast ratio 2.17 10 1.51 10 2.68 10 1.01 -
D. Homogeneous Field permeability 327.7 478.1 1216 4.0 mD
Table 6: Statistical parameters of horizontal permeability () maps generated by Stanford Geostatistical Modeling Software (SGeMS) sgems. We defined the medium appearance, spatial correlation, mean, standard deviation, and contrast ratio () in each map to create a large variety of permeability maps.

Appendix D CNN benchmark model architecture

Part Layer Output Shape
Input - (96,200,24,1)
Encode 1


Encode 2 Conv3D/BN/ReLu (48,100,12,64)
Encode 3 Conv3D/BN/ReLu (24,50,6,128)
Encode 4 Conv3D/BN/ReLu (24,50,6,128)
Encode 5 Conv3D/BN/ReLu (12,25,3,256)
Encode 6 Conv3D/BN/ReLu (12,25,3,256)
ResConv 1 Conv3D/BN/Conv3D/BN/ReLu/Add (12,25,3,256)
ResConv 2 Conv3D/BN/Conv3D/BN/ReLu/Add (12,25,3,256)
ResConv 3 Conv3D/BN/Conv3D/BN/ReLu/Add (12,25,3,256)
ResConv 4 Conv3D/BN/Conv3D/BN/ReLu/Add (12,25,3,256)
ResConv 5 Conv3D/BN/Conv3D/BN/ReLu/Add (12,25,3,256)
Decode 6 UnSampling/Padding/Conv3D/BN/Relu (12,25,3,256)
Decode 5 UnSampling/Padding/Conv3D/BN/Relu (24,50,6,256)
Decode 4 UnSampling/Padding/Conv3D/BN/Relu (24,50,6,128)
Decode 3 UnSampling/Padding/Conv3D/BN/Relu (48,100,12,128)
Decode 2 UnSampling/Padding/Conv3D/BN/Relu (48,100,12,64)
Decode 1 UnSampling/Padding/Conv3D/BN/Relu (96,200,24,32)
Output Conv3D (96,200,24,1)
Table 7: CNN architecture. Conv3D denotes a 3D convolutional layer; BN

denotes a batch normalization layer;

ReLu denotes a rectified linear layer; Add denotes an addition with the identity; UnSampling denotes an unSampling layer that expands the matrix dimension using nearest neighbor method, and Padding denotes a padding layer using the reflection padding technique. In this model, the number of total parameters is 33,316,481 with 33,305,857 trainable parameters and 10,624 non-trainable parameters. To ensure a fair comparison with the FNO-based models, we performed hyper-parameter optimization on the CNN benchmark model and trained it with the same loss function (Equation 12) as the FNO-based models.

Appendix E FNO model architecture

Part Layer Output Shape
Input - (96,200,24,12)
Padding Padding (104, 208, 32, 12)
Lifting Linear (104, 208, 32, 36)
Fourier 1 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 2 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 3 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 4 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 5 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 6 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Projection 1 Linear (104, 208, 32, 128)
Projection 2 Linear (104, 208, 32, 1)
De-padding - (96, 200, 24, 1)
Table 8: FNO model architecture. The Padding denotes a padding operator that accommodates the non-periodic boundaries; Linear denotes the linear transformation to lift the input to the high dimensional space, and the projection back to original space; Fourier3d denotes the 3D Fourier operator; Conv1d denotes the bias term; Add operation adds the outputs together; ReLu denotes a rectified linear layer. In this model, the number of total parameters is 31,117,541.

Appendix F Conv-FNO model architecture

Part Layer Output Shape
Input - (96,200,24,12)
Padding Padding (104, 208, 32, 12)
Lifting Linear (104, 208, 32, 36)
Fourier 1 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 2 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 3 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Conv-Fourier 1 Fourier3d/Conv1d/Conv3d/Add/ReLu (104, 208, 32, 36)
Conv-Fourier 2 Fourier3d/Conv1d/Conv3d/Add/ReLu (104, 208, 32, 36)
Conv-Fourier 3 Fourier3d/Conv1d/Conv3d/Add/ReLu (104, 208, 32, 36)
Projection 1 Linear (104, 208, 32, 128)
Projection 2 Linear (104, 208, 32, 1)
De-padding - (96, 200, 24, 1)
Table 9: Conv-FNO model architecture. The Padding denotes a padding operator that accommodates the non-periodic boundaries; Linear denotes the linear transformation to lift the input to the high dimensional space, and the projection back to original space; Fourier3d denotes the 3D Fourier operator; Conv1d denotes the bias term; Conv3d denotes a 3D convolutional operator; Add operation adds the outputs together; ReLu denotes a rectified linear layer. In this model, the number of total parameters is 31,222,625.

Appendix G U-FNO model architecture

Part Layer Output Shape
Input - (96,200,24,12)
Padding Padding (104, 208, 32, 12)
Lifting Linear (104, 208, 32, 36)
Fourier 1 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 2 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
Fourier 3 Fourier3d/Conv1d/Add/ReLu (104, 208, 32, 36)
U-Fourier 1 Fourier3d/Conv1d/UNet3d/Add/ReLu (104, 208, 32, 36)
U-Fourier 2 Fourier3d/Conv1d/UNet3d/Add/ReLu (104, 208, 32, 36)
U-Fourier 3 Fourier3d/Conv1d/UNet3d/Add/ReLu (104, 208, 32, 36)
Projection 1 Linear (104, 208, 32, 128)
Projection 2 Linear (104, 208, 32, 1)
De-padding - (96, 200, 24, 1)
Table 10: U-FNO model architecture. The Padding denotes a padding operator that accommodates the non-periodic boundaries; Linear denotes the linear transformation to lift the input to the high dimensional space, and the projection back to original space; Fourier3d denotes the 3D Fourier operator; Conv1d denotes the bias term; UNet3d denotes a two step 3D U-Net; Add operation adds the outputs together; ReLu denotes a rectified linear layer. In this model, the number of total parameters is 33,097,829.