1 Introduction
History matching in the context of oil and gas reservoir engineering involves calibrating uncertain model parameters such that flow simulation results match observed data to within some tolerance. The uncertain variables most commonly considered are porosity and permeability in every grid block of the model. History matching algorithms are often most effective when combined with procedures for parameterizing these geological variables. Such parameterizations, based on, e.g., principal component analysis (PCA), serve two key purposes. Namely, they reduce the number of variables that must be determined through data assimilation, and they preserve, to varying degrees, the spatial correlation structure that exists in prior geological models.
A number of geological parameterizations have been presented and assessed. However, as discussed below, existing methods have some important limitations. In this work, a new lowdimensional parameterization technique for complex geological models, involving a recently developed deeplearningbased algorithm known as fast neural style transfer (Johnson et al., 2016), is introduced. This new technique is referred to as CNNPCA since it combines convolutional neural network (CNN) approaches from the deeplearning domain and PCA. The CNNPCA representation can be viewed as an extension of an existing parameterization procedure, optimizationbased PCA (OPCA) (Vo and Durlofsky, 2014, 2015). CNNPCA shows advantages over OPCA for complex systems, especially when conditioning data are scarce.
Previous work on lowdimensional model parameterization has involved the direct use of PCA representations (Oliver, 1996; Reynolds et al., 1996; Sarma et al., 2006), discrete wavelet transform (DWT) (Lu and Horne, 2000), discrete cosine transform (DCT) (Jafarpour et al., 2010), levelset methods (Chang et al., 2010; Ping and Zhang, 2013), KSVD (Tavakoli and Reynolds, 2010)
, and tensor decomposition
(Insuasty et al., 2017) procedures. These methods address the parameterization problem in different manners and each has advantages and limitations. For example, the DWT and DCT procedures transform model parameters based on wavelet functions or cosine functions, though they do not incorporate the prior covariance matrix of model parameters when constructing the basis. Therefore, conditioning DCT and DWT parameterizations to prior geological information is not straightforward. PCA is based on the eigendecomposition of the covariance matrix of prior models. As such, it honors only the twopoint spatial statistics of the prior models, and is thus not directly applicable to nonGaussian systems.Several methods have been proposed to extend PCA for nonGaussian systems. PluriPCA, developed by Chen et al. (2016), combines PCA and truncatedpluriGaussian (Astrakova and Oliver, 2015) representations to provide a lowdimensional model for multifacies systems. This procedure entails truncation of the underlying PCA models and is thus nondifferentiable. HakimElahi and Jafarpour (2017) introduced a distance transformation to map a discrete facies model into a continuous distance model that can be parameterized with PCA. This technique, like the pluriPCA procedure, is however currently limited to discrete systems. Kernel PCA (KPCA) generalizes PCA through a kernel formulation that acts to preserve some multipoint spatial statistics (Sarma et al., 2008). The optimizationbased PCA (OPCA) procedure essentially extends PCA with a postprocessing step based on the singlepoint statistics (histogram) of the prior models (Vo and Durlofsky, 2014, 2015). Regularized KPCA (RKPCA) (Vo and Durlofsky, 2016) can be viewed as a combination of KPCA and OPCA. Emerick (2016) investigated the performance of PCA, KPCA, and OPCA in the history matching of channelized reservoirs using an ensemble algorithm. In that study, OPCA performed the best in terms of data assimilation and maintaining key geological features. In addition, although KPCA and RKPCA have some theoretical advantages over OPCA, it is not clear if these more complex methods outperform OPCA in problems of practical interest.
However, as illustrated later in this work, OPCA does display limitations when applied to complex geomodels with little or no conditioning data. This is because OPCA entails a postprocessing of the underlying PCA model using what is essentially a pointwise histogram transformation. The ability of OPCA to provide models consistent with the training image depends on the quality of the underlying PCA model, which in turn depends on the complexity of the training image and the amount of hard data available. The pointwise postprocessing step does not substantially improve the ability of OPCA to capture multipoint statistics. The CNNPCA formulation introduced here employs features for capturing multipoint spatial correlations, which provides enhancement relative to OPCA for nonGaussian systems.
Recent successes in the application of deep learning for image processing have inspired the development of geological parameterization techniques based on algorithms that use deep neural networks. Among the many deeplearning procedures, the socalled generative models, which are used for constructing new images consistent with a set of training images, are of particular interest. Mosser et al. (2017) applied a type of generative adversarial network (GAN) for generating binary porous media models for porescale flow simulation. The use of deep neural networks for lowerdimensional geological parameterization has been proposed by Laloy et al. (2017a, b)
. They applied generative deep neural network models, based on a variational autoencoder (VAE) and a spatial generative adversarial network (SGAN), to obtain lowerdimensional representations of binary facies models. Both procedures have the ability to sample a lowerdimensional variable from a simple distribution to obtain models consistent with the training image. The quality of the new models (in both 2D and 3D) was shown to be superior to those based on existing algorithms, including PCA, OPCA, and DCT. The two methods have also been successfully applied for data assimilation for flow simulation problems. In other work,
Canchumuni et al. (2017) developed an autoencoderbased parameterization for representing facies models. This approached achieved comparable results to those from a method that combined truncated Gaussian simulation and PCA. The methods noted above are innovative in their use of recent developments in deep learning. They still have limitations, however, including substantial computational demands for training the neural networks and imperfect conditioning to hard data.In this study, an alternative deeplearningbased geological parameterization method, inspired by the ‘fast neural style transfer’ algorithm of Johnson et al. (2016), is developed and applied. The new method, referred to as CNNPCA, can be viewed as a generalized OPCA that includes features for capturing multipoint correlations. The metrics that enter the formulation are based on statistical quantities extracted from deep convolutional neural networks. In addition, the (somewhat) computationally demanding optimization component of the method is performed in an offline preprocessing step, so the online generation of models (as required in history matching) is very fast. In addition, although not guaranteed (as in PCA and OPCA), the CNNPCA approach developed here honors hard data for the cases considered. Extensions to handle bimodal models (as opposed to strictly binary models) are also presented.
This paper proceeds as follows. In Sect. 2, the CNNPCA procedure is presented as a generalization of OPCA, in which convolutional neural networks are applied. Next, in Sect. 3, both unconditional and conditional realizations generated using CNNPCA are presented for a binary channelized system. In Sect. 4, the accuracy of flow statistics obtained from CNNPCA realizations are assessed by comparing with reference geostatistical (SGeMS) realizations as well as to OPCA models. Then, in Sect. 5, history matching is performed to assess the ability of posterior CNNPCA models to provide production forecasts for both existing wells and new wells. Next, in Sect. 6, CNNPCA is extended to model a nonstationary bimodal deltaic fan system. Concluding remarks are provided in Sect. 7. The detailed architecture for the convolutional neural networks used in CNNPCA is presented in the Appendix.
2 Convolutional Neural Networkbased Principal Component Analysis (CNNPCA)
In this section the new geological parameterization, CNNPCA, is developed. The method is presented as a generalized OPCA procedure coupled with new treatments based on deep convolutional neural networks.
2.1 Optimizationbased Principal Component Analysis (OPCA)
The uncertain rock properties that characterize a reservoir model, such as permeability and porosity, are often represented as random fields. In this paper the geological model is denoted by , where is the number of grid blocks (computational cells). The spatial correlation structure to be captured in is defined by the geological features in the reservoir.
The basic idea of PCA, also referred to as KarhunenLoève expansion, is to represent the random field using a set of variables in a lowerdimensional subspace spanned by orthogonal basis components. PCA is highly efficient for dimension reduction and, as discussed below, enables the generation of new realizations of by sampling
from a standard normal distribution. Here a brief description of the PCA procedure is provided. Refer to
Sarma et al. (2008) and Vo and Durlofsky (2014) for further details.The first step in PCA is to generate a set of realizations of using a geostatistical toolbox such as SGeMS (Remy et al., 2009). Note that throughout this paper, realizations generated using geostatistical simulation algorithms within SGeMS will be referred to as ‘SGeMS realizations’ or ‘SGeMS models.’ Next the SGeMS realizations are assembled into a centered data matrix
(1) 
where , denotes realization , and is the mean of all
realizations. Then a singular value decomposition is performed on
, which gives , whereis the left singular matrix whose columns are the eigenvectors of
,is a diagonal matrix whose elements are the square root of the corresponding eigenvalues, and
is the right singular matrix.Given and , new PCA realizations () can be generated by
(2) 
where contains the columns in that are associated with the largest eigenvalues, is a diagonal matrix containing the square roots of the corresponding eigenvalues in , and
is a vector with elements drawn independently from the standard normal distribution. In practice, the variability of the random field
can often be captured using relatively few PCA components (columns in ), which leads to . The choice of can be determined by applying an ‘energy’ criterion (Sarma et al., 2008) or through visual inspection.PCA performs well for Gaussian random fields, whose spatial correlation structure can be fully characterized by twopoint statistics (i.e., by the covariance matrix). However, for nonGaussian models with spatial correlation characterized by multipoint statistics, PCA realizations constructed using Eq. 2 do not fully honor the geological features. OPCA was developed by Vo and Durlofsky (2014) to provide improved performance for nonGaussian models. The OPCA formulation for binary systems will now be described. In such systems, the value of is either 0 or 1, where, e.g., 0 represents mud (shale) facies and 1 represents sand (channel) facies. The detailed formulation for OPCA for bimodal systems is presented in Vo and Durlofsky (2015).
For binary systems, the OPCA model is constructed through solution of the following minimization problem:
(3) 
Here , represents the value in grid block , and is a vector with all components equal to one. Equation 3 defines a minimization problem for finding the
vector that minimizes a loss function, consisting of two terms with a weight factor
. The first term, , represents the difference between and the underlying PCA model (Eq. 2), which ensures that the solution resembles the PCA description. The second term, , is a regularization that is a minimum (zero) when is either 0 or 1. It thus acts to shift the solution towards a binary distribution. The separable (blockbyblock) analytical solution of Eq. 3 is provided by Vo and Durlofsky (2014). As discussed there, OPCA realizations can be viewed as a postprocessing of the corresponding PCA realization through application of a blockwise histogram transformation.The application of PCA and OPCA to generate new realizations of a binary facies model will now be demonstrated. The conditional channelized system, described in Vo and Durlofsky (2014), is considered. Figure 1 displays the training image, which characterizes the geological features (spatial correlation structure) and thus the multipoint statistics. The training image is defined on a grid. In the first step, a total of conditional SGeMS realizations are generated using the ‘snesim’ geostatistical algorithm (Strebelle, 2002). The model contains grid blocks, therefore . Figure 2a, b shows one SGeMS realization and the corresponding histogram of gridblock facies type. The red and blue colors in Fig. 2a correspond to sand and mud facies, respectively. The 16 white points indicate the well locations. There are three wells in mud facies and 13 wells in sand facies. All SGeMS realizations are conditioned to hard data at these well locations. The training image contains overlapping sinuous sand channels, with general orientation of about . The SGeMS realization (Fig. 2a) is seen to capture these features.
The PCA and OPCA representations are constructed using a reduced dimension for the PCA components (Eqs. 2 and 3). In OPCA, the weighting factor is set to 0.8. These values correspond to those used by Vo and Durlofsky (2014). Realizations obtained from PCA and OPCA, and the associated histograms, are shown in Fig. 2. Both the PCA and OPCA models honor all hard data. However, the PCA model displays continuous values for the facies type and does not fully capture channel connectivity. In addition, the corresponding histogram of gridblock facies type (Fig. 2e) is essentially Gaussian, rather than nearbinary. The OPCA model, by contrast, displays a much sharper contrast between sand and mud facies, and the associated histogram is approximately binary (Fig. 2f). In addition, the OPCA model shows channel geometry and connectivity that are reasonably consistent with that in the SGeMS realization (Fig. 2a).
2.2 Generalized OPCA
In the example above, the OPCA procedure gives satisfactory results in terms of reproducing the histogram and the geological features evident in the training image. However, as mentioned earlier, OPCA essentially performs histogram transformation of the PCA gridblock values, which only ensures reproduction of the histogram. As will be shown later, for more challenging cases when there are fewer or no conditioning (hard) data, OPCA realizations may not adequately capture the spatial correlation structure inherent in the training image.
This limitation of OPCA for nonGaussian models is expected since Eq. 3 does not account for higherorder spatial correlations. The first term in the OPCA equation, , penalizes OPCA realizations that do not closely resemble the underlying PCA realizations, and these only honor twopoint correlations. The regularization term, , is a metric based only on singlepoint statistics (histogram). This motivates the introduction of a generalized OPCA formulation that incorporates multipoint spatial statistics.
This generalized OPCA formulation is expressed as
(4) 
where denotes a PCA realization, and denotes a reference model that honors the target spatial correlation structure. The objective function consists of two loss functions. The first loss function, , referred to as the content loss, quantifies the ‘closeness’ between model and in terms of the content. The second loss function, , referred to as style loss, quantifies the degree to which resembles the style of . ‘Style’ in this case refers to channel continuity and channel width, and the sharp contrast between sand and mud. The reference model can be either a training image or a particular SGeMS realization. When the spatial random field is stationary, the dimension of can be different than the dimension of . Finally, the weighting factor is referred to as the style weight.
The original OPCA formulation in Eq. 3 can be viewed as one particular case of the generalized formulation. In this instance the content loss is based on the Euclidean distance between and , and the style loss is based on the histogram. The style loss in OPCA does not explicitly contain a reference model , but it is essentially a metric for the mismatch between the histogram of and the histogram of .
In this study, a generalized OPCA formulation with new content and style losses is introduced in order to generate models that better honor . The new loss terms are inspired by the neural style transfer algorithm, developed by Gatys et al. (2016) within the context of computer vision. For the 2D systems considered in this study, the new content and style losses correspond closely to those in the neural style transfer algorithm.
The highlevel procedure for generalized OPCA with CNNbased loss functions is as follows. For the content loss function, let be an encoded representation of the model such that, if two models and have the same encoded representation (), then they should be perceived as having the same content. For instance, in the channelized system, if , then and should have approximately the same locations for sand and mud. The content loss is defined as
(5) 
where is the number of elements in and and the subscript stands for Frobenius norm. Rather than directly using as the content loss, the encoded representations are used because they are less sensitive to the blockbyblock match between and . In addition, this treatment has been shown to generate less noisy models.
To quantify the style loss, let denote a set of statistical metrics that can effectively capture the spatial correlation structure of the random fields. In other words, if and produce the same outcome for all of these statistical measurements (), then and should be perceived as having the same spatial correlation structure. Specifically, the style loss is defined as
(6) 
where is the number of elements in . Both the encoded representations and the statistical metrics are based on deep convolutional neural networks, described in more detail in Sect. 2.3.
With the above content and style loss terms, the generalized OPCA procedure is as follows. Starting with a PCA model as the initial guess, Eqs. 5 and 6 are evaluated. The content loss for the initial guess is zero since . However, the style loss for the initial guess is likely to be large since the PCA model and the reference model have quite different styles. The model is then updated to reduce the style loss while also limiting the content loss. Gradientbased optimization algorithms can be applied to update because the gradient of the loss functions with respect to can be readily obtained, as described below. The iterations continue until a stopping criterion is reached, such as a maximum number of iterations. The model corresponding to the optimal solution preserves the content indicated in the PCA model and matches the style of .
2.3 CNNbased Loss Functions
The detailed treatments for the loss terms are now described, starting with the set of statistical metrics used for the style loss. The purpose of these metrics is to explicitly characterize the spatial correlation structure of a spatial random field. The covariance matrix and experimental variogram are two such metrics for Gaussian models. However, for nonGaussian models with multipoint correlations, obtaining effective metrics for the spatial correlation is not as straightforward. Note that although there are multipoint statistics simulation algorithms that can generate realistic realizations for nonGaussian models, these algorithms do not generally provide explicit metrics that quantify how well the spatial correlation is honored.
Theoretically, the order joint histogram of the model can fully capture the correlation of
points. However, even for models of relatively small size, the computational cost to obtain accurate estimates is prohibitive. A more practical approach, proposed by
Dimitrakopoulos et al. (2010), uses highorder experimental anisotropic cumulants calculated with predefined spatial templates. This approach can be viewed as an extension of twopoint variograms for multipoint statistics. However, for different geological systems, different spatial templates need to be selected to effectively capture the spatial correlation.Rather than using highorder statistical quantities of the original spatial random field, other treatments instead use loworder summary statistics of filter responses, obtained by scanning ‘templates’ through the spatial random field. In Zhang et al. (2006)
, for example, it was shown that the histograms of linear filter responses obtained with simple templates are effective at classifying training images with different patterns. Figure
3 displays a binary channelized model and the linear filter response map, denoted by , obtained with a template (also referred to as a filter), and the histogram of . The linear filtering process is accomplished by scanning with the template . At each location the sum of the elementwise product of and a local block of is computed to obtain the value of the corresponding element in .Away from the boundary, the mathematical formulation of this linear filtering is
(7) 
where is a constant referred to as bias ( in this example), subscripts and are and direction indices, and the size of the template is
. This scanning process is also referred to as a convolution. For the boundary, zeropadding is used, where one row and one column of zeros are added on each boundary of
such that is of the same size as . In this case, the simple template acts to compute the direction gradient of . The nonzero values in the resulting filter response indicate the edges (excluding edges aligned with the axis) between sand and mud. The histogram of can be used as a metric to characterize the edges in the system. This reflects an important aspect of the spatial correlation in channelized systems.A recent study by Gatys et al. (2015)
in computer vision showed that summary statistics of the filter responses obtained with pretrained deep convolutional neural networks can effectively characterize multipoint correlations in spatial random fields. In their case the spatial random fields were images and the spatial correlation was depicted by texture images. A convolutional neural network consists of many convolutional layers. Each convolutional layer first performs linear filtering, similar to that described above, on the output from the previous layer (or on the input in the case of the first layer). Then a nonlinear activation function, which in the model transform net described below is ‘relu’ (rectified linear unit,
), is applied on the filter response maps to give the final output from the convolutional layer. This output is referred to as the ‘activation’ of the layer. Note that there are typically multiple filters applied at each convolutional layer. The filter response maps associated with the multiple filters are stacked into a thirdorder tensor. Thus the filters are also often thirdorder tensors. For a more detailed description of CNN, the reader is referred to Goodfellow et al. (2016).The CNNbased method in Gatys et al. (2015) can be viewed as an extension of the linear filtering method described above. The improvements are mainly twofold. First, instead of using relatively few handcrafted templates (filters), a CNN contains a large number of filters obtained automatically through a training process. Second, rather than using linear filtering, each convolutional layer entails a nonlinear filtering process. The activation at different convolutional layers can be viewed as a set of multiscale nonlinear filter responses. In the context of computer vision, the statistical metrics are based on the nonlinear filter responses of the models to a deep CNN pretrained on image classification. After a CNN is trained for classifying images, the filters at each convolutional layer are capable of capturing different characteristics of the input image. It will be shown that CNNs trained with images are also directly applicable for 2D geological models.
Although not considered in this study, the use of CNNs to characterize multipoint statistics can be extended to 3D models. Potential approaches include layerbylayer treatments, which may be appropriate when the correlation in the vertical direction is relatively weak (as is often observed in reservoir models). A more general technique is to train a convolutional neural network to capture the full spatial correlation structure of the 3D model. These approaches will be investigated in future work.
For 2D Cartesian models, can be written as a matrix of dimensions , where and denote the number of grid blocks in the and directions. Following the RGB format for images, is converted into a thirdorder tensor of size by first replicating three times along the third dimension, and then multiplying by 255 such that, in the binary case, the elements of are mapped to the range . The pretrained CNN for extracting the nonlinear filter responses is the 16layer VGG network (Simonyan and Zisserman, 2015)
, trained for image classification over the ImageNet dataset
(Deng et al., 2009).Following suggestions in Johnson et al. (2016), activations are extracted at four specific layers (), as shown in Fig. 4. The activation at layer , denoted by , is a thirdorder tensor of size , where and are the dimensions of along the and directions, and is the number of filters in convolutional layer . Next, is reshaped into a socalled feature matrix of size , where is the number of elements in each filter response map.
The uncentered covariance matrices of the nonlinear filter responses are referred to as Gram matrices, (Gatys et al., 2016). These are defined as
(8) 
Gram matrices thus defined have been shown to represent a set of effective statistical metrics for characterizing the multipoint correlation of . In fact, the style loss is defined in terms of these as follows:
(9) 
Note that the dimensions of and can be different, but those of and are both , where depends only on the architecture of the CNN and is invariant to the input dimension.
For the content loss, the nonlinear filter responses (feature matrices) provide encoded representations of that capture the content in . As described in Gatys et al. (2016), with increasing layer index , the feature matrices encode largerscale content and become less sensitive to the exact elementwise values in . Therefore, following the treatment of Johnson et al. (2016), the nonlinear filter response at the second layer is used to construct the content loss. This was shown to provide a reasonable balance between largescale content and smallscale details. The precise form for the content loss term is thus
(10) 
Combining Eqs. 4, 9, and 10, the generalized OPCA formulation with CNNbased feature representation can be written as
(11) 
The goal, for a given realization of , is to obtain an optimal model that minimizes the weighted sum of the distance to the underlying PCA model , in terms of the feature matrices, and the distance to the reference model , in terms of the Gram matrices. The gradient of the loss function with respect to can be readily computed using backpropagation through the VGG16 net, so gradientbased minimization strategies can be used for this problem. The initial guess is typically set to be either a random noise model or the PCA model .
The optimization in Eq. 11 is still computationally demanding since evaluating the objective function and its gradient requires forward and backward passes through a deep CNN. During history matching, where is updated many times, this approach becomes inefficient as it requires solving this optimization problem a large number of times. Therefore, in this study, Eq. 11 is not used to generate realizations. Instead, following the idea proposed in Johnson et al. (2016), a more efficient algorithm, CNNPCA, is formulated. In CNNPCA a separate CNN is trained to form an explicit transform function that postprocesses PCA realizations very quickly, thus avoiding the timeconsuming optimization in Eq. 11. The CNNPCA treatment is now described.
2.4 CNNPCA Procedure
The optimization in Eq. 11 can be interpreted as a postprocessing of the underlying PCA model to match the pattern of the target training image. In other words, Eq. 11 defines an implicit mapping from the PCA model to the postprocessed model. To map each PCA model to its corresponding postprocessed model requires solving the computationally demanding optimization in Eq. 11. It would be much more efficient if an explicit mapping function could be obtained such that the postprocessing of a PCA model required only an explicit function evaluation.
The idea here is to accomplish such a mapping through use of an alternative CNN, which takes PCA models as input and provides postprocessed models as output. This CNN is referred to as the model transform net and is denoted by , where the subscript represents the parameters in the network. Following the welldesigned network architecture proposed in Johnson et al. (2016), the model transform net consists of 16 convolutional layers and includes a total of 1,668,865 parameters. The detailed architecture of the model transform net is specified in the Appendix.
The training process for the model transform net involves a minimization problem similar to the generalized OPCA procedure described earlier (Eq. 11). The difference is that, rather than directly obtaining an optimal postprocessed model for each particular PCA realization, the idea here is to obtain optimal modeltransformnet parameters such that the resulting models display small content and style losses. The first step of the training process is to construct a training set consisting of random PCA models , , generated by sampling from the standard normal distribution and then applying Eq. 2. These random PCA models are then fed through the initial model transform net (with random parameters) to obtain the postprocessed models , . Next, the content and style losses for each pair of (corresponding) PCA and postprocessed models are quantified. The parameters in the model transform net can then be modified to minimize the average combined loss over the training set.
This training process is described by the following minimization:
(12) 
where the decision variable denotes the parameters in the model transform net, and the loss function in Eq. 12 is the average over the training set. For each pair of PCA and postprocessed models, the loss function is the same as in the generalized OPCA formulation (Eq. 11
). Note that the training of the model transform net is an unsupervised learning process, as there is no predefined ‘true’ or ‘reference’ postprocessed model provided for each PCA model in the training set.
Figure 5 illustrates the procedure for evaluating content and style losses for one PCA model in the training set, for the binary channelized system. The PCA model is fed through the model transform net to obtain the postprocessed model . Next , and the reference model
are converted to grayscale images and then fed through the ‘Loss Network’ (VGG16) to extract the feature matrices
and then compute the Gram matrices , . In this case, the reference model is the training image (Fig. 1) of dimensions , which is different than the size of the models. Note that the postprocessed model , shown in Fig. 5, is obtained with the optimal model transform net.The gradient of the objective function with respect to
can be readily computed using backpropagation through the CNN. The adaptive moment estimation (ADAM) algorithm, which has been proven effective for optimizing various deep neural networks, is used as the optimizer
(Kingma and Ba, 2014). ADAM is an extension of stochastic gradient descent where the gradient at each iteration is approximated using minibatches of the training set rather than the entire training set at once. In other words, at each iteration, the average loss in Eq.
12 is evaluated over a small batch of PCA models selected from the training set, where is the batch size. The rate at which decision variables are updated at each iteration is controlled by the learning late .After modeltransformnet training, given a PCA model , the postprocessed model can be obtained very quickly from a forward pass of through the model transform net, as shown in Fig. 6. Note that the transformation through the model transform net is differentiable. The derivative can be obtained using backpropagation through the model transform net. This enables the direct calculation of
(using the chain rule and the fact that
). Although in this study gradientfree history matching is applied (as described in Sect. 5), for more efficient gradientbased methods is required. A significant advantage of this approach is that it provides the necessary derivative.It can be seen in Fig. 6 that is not strictly binary. A final histogram transformation can be applied, if desired, to ensure the postprocessed models are binary or very close to binary. Since gradientfree history matching is used in this work, a simple nondifferentiable hardthresholding step, with a cutoff value of 0.5, is applied here. Application of a differentiable histogram transformation will preserve the differentiability of the CNNPCA procedure. As discussed in the extension of CNNPCA to bimodal systems in Sect. 6, one option is to use OPCA as a final postprocessing step.
The overall CNNPCA procedure is summarized in Algorithm 1. The CNNPCA algorithm used here is modified from an open source implementation (Kadian, 2018) of the procedure in Johnson et al. (2016)
within the PyTorch Python library
(Paszke et al., 2017). In PyTorch and many other deeplearning frameworks, back propagation through neural networks for computing gradients can be readily accomplished using autodifferentiation.2.5 Hard Data Loss
Hard data are measurement data of static geological properties, such as the permeability values, at particular locations (typically at exploration and production wells). PCA models will always honor hard data if the SGeMS realizations used to construct the PCA basis honor the hard data. Although the CNNPCA models are postprocessed from , training the model transform net with Eq. 12 does not guarantee that hard data will be honored. To address this issue, which would be of concern in practical applications, an additional hard data loss term is introduced. The CNNPCA minimization for conditional systems is now given by
(13) 
Here is a weighting factor, referred to as the harddata weight, for the new loss . This harddata loss is given by
(14) 
where is a harddata indicator vector, with meaning there are hard data for grid block and meaning there are not hard data for grid block , the square difference is evaluated component by component (to give a vector), and is the number of harddata values specified. The value of is found through numerical experimentation. Specifically, in this study, is determined such that all hard data are correctly honored over a large set of CNNPCA models.
3 Model Generation Using CNNPCA
In this and the following sections, CNNPCA performance is assessed through a variety of evaluations. These assessments follow those presented in Vo and Durlofsky (2014), in that the (visual) quality of random CNNPCA realizations is first considered, then the flow responses for an ensemble of models are assessed, and finally history matching results are evaluated. In the assessment in this section, CNNPCA will be applied to generate new models for the binary system considered in Sect. 2.1. Both unconditional and conditional models will be compared with OPCA realizations.
3.1 Unconditional Realizations
Conditional realizations generated from SGeMS, PCA, and OPCA for a binary system (Fig. 2) were presented in Sect. 2.1. The focus now is on the generation of unconditional realizations. The training image and model setup are the same as described in Sect. 2.1. A total of SGeMS unconditional realizations (Fig. 7) are generated and used to construct the PCA representation (Eq. 2), with a reduced dimension . For the OPCA representation (Eq. 3), a weighting factor of is used (Vo and Durlofsky, 2014). Note that the same set of SGeMS realizations is used for PCA, OPCA, and CNNPCA model construction.
Figure 8 displays two random PCA model realizations and the corresponding OPCA and CNNPCA models. It is evident that the OPCA models (Fig. 8b, e) display sharper features in comparison to the corresponding PCA models (Fig. 8a, d). However, the channels in the OPCA models lack the degree of connectivity evident in the training image (Fig. 1) and in the SGeMS realization (Fig. 7). In addition, the channel width in the training image is very consistent, while in the OPCA realizations large variations are evident in Fig. 8c, e. These behaviors were also observed by Vo and Durlofsky (2014).
CNNPCA realizations are generated following Algorithm 1. The first step is to train the model transform net. The training set consists of random PCA realizations (Eq. 2). The value of the style weight was investigated over a range from 0.05 to 3. Based on visual inspection, a value of was found to provide CNNPCA models with the best balance between resemblance to the PCA models and consistency with the training image (additional values of are considered below). The model transform net was then trained with the ADAM algorithm for a total of 2250 iterations, with a batch size of and a learning rate of . The values of these parameters were obtained through numerical experiments. The training process requires around 3 minutes with one GPU (NVIDIA Tesla K80) for this case. With the trained model transform net, new CNNPCA models are generated following Step 8 and 9 in Algorithm 1.
Figure 8c, f shows the CNNPCA realizations corresponding to the PCA realizations in Fig. 8a, d. It is clear that the binary features and the channel width and connectivity show significant improvement relative to that in the PCA and OPCA realizations. In addition, the relative locations of sand and mud in the CNNPCA realizations are consistent with the trends in the corresponding PCA realizations. These comparison results demonstrate that CNNPCA realizations can better preserve the geological features displayed in the training image (Fig. 1). This suggests that the CNNPCA formulation described in Sect. 2.4 is indeed able to capture higherorder spatial statistics.
The impact of style weight on CNNPCA models is now illustrated. In general, small leads to models that closely resemble the corresponding PCA models but display incorrect spatial correlation, while large values have the opposite effect. In addition, the variability of CNNPCA realizations collapses (i.e., all the CNNPCA realizations become almost identical) when using overly large . These observations are apparent in Fig. 9. Realizations constructed using a small style weight of , shown in Fig. 9a, b are basically blockwise mappings of the underlying PCA models (Fig. 8a, d). They display relatively poor channel continuity and inconsistent channel width. With a large style weight of , the CNNPCA realizations do not honor the general sand and mud locations indicated in the corresponding PCA models. They also show less variability in terms of channel geometry. A value of , used in this study, is seen to provide realizations (Fig. 8c, f) that represent an appropriate balance between the two loss functions.
3.2 Conditional Realizations
The generation of CNNPCA realizations conditioned to hard data at well locations (i.e., the facies type at well locations) is now considered. This setup was considered earlier in Sect. 2.1. The training image used in Sect. 2.1 is again used here, and it is also assumed that hard data are available at 16 well locations, as in Vo and Durlofsky (2014). The hard data locations are indicated by the white points in Fig. 10. Three wells are located in mud and the other 13 wells are located in sand. The procedures for generating PCA, OPCA and CNNPCA models are as described in the previous section, except for two aspects. First, the SGeMS realizations are now conditional realizations that honor the hard data. Second, when training the model transform net, the hard data loss function (Eqs. 13 and 14) is included. The weighting factor is set to be 16 (based on numerical experimentation and the criterion given in Sect. 2.5).
A total of 200 PCA realizations and corresponding OPCA and CNNPCA realizations are generated. As noted in Sect. 2.5, PCA and OPCA models always honor hard data. In this case, each of the 200 CNNPCA models also honors all of the hard data. This demonstrates that the CNNPCA procedure is able to generate conditional realizations for this binary system, and that the value of is sufficiently large. Figure 10 displays two PCA realizations and the corresponding OPCA and CNNPCA realizations. It is evident that the CNNPCA realizations do indeed honor facies type at all well locations. Additionally, the CNNPCA realizations continue to display channel sinuosity and width consistent with the training image. Note finally that the OPCA realizations in this case display better channel continuity than do those for the unconditioned case (compare Fig. 10b, e to Fig. 8b, e), though the channel geometry in the CNNPCA realizations is still closer to that in the training image.
4 Flow Simulation with Random CNNPCA Realizations
In this section, the flow statistics (e.g., P10, P50, P90 responses) associated with an ensemble of CNNPCA and OPCA models are compared to those for reference SGeMS models. Comparisons of flow responses between OPCA and SGeMs models were performed by Vo and Durlofsky (2014). In that study, OPCA was found to provide flow statistics in reasonable agreement with SGeMS for models with a relatively large amount of conditioning data. However, in the absence of conditioning data, or when such data are sparse, the flow responses for OPCA models are less accurate. Here a conditional system with a small amount of conditioning data is considered. This is a particularly challenging case since production wells are far from injection wells, so maintaining channel continuity in the geomodel strongly impacts flow response.
The reservoir model is defined on a grid. Grid blocks are of size 50 m in the and directions, and 10 m in the direction. The binary facies model represents a conditional channelized system, with hard data at two injection wells (I1 and I2) and two production wells (P1 and P2), as shown in Fig. 11. All four wells are located in sand. The same setup as in Sect. 3.2 is used to construct the PCA, OPCA and CNNPCA models, except the hard data weight for CNNPCA is 10 in this case. A smaller may be required here than in the earlier example because there are fewer hard data to honor in this case. A total of 200 random PCA realizations and corresponding OPCA and CNNPCA realizations are generated. For the SGeMS models, 200 of the 1000 realizations used to construct the PCA representation are selected randomly.
The binary facies model denotes the gridblock facies type, with indicating sand and indicating mud in block . The gridblock porosity is specified as , where is the sand porosity, and is the mud porosity. Gridblock permeability (taken to be isotropic) is specified as , where the two constants are md and . This results in permeability values of 2000 md for sand and 2 md for mud, which corresponds to a strong contrast between facies. Note that this contrast is 10 times more than that in Vo and Durlofsky (2014), where permeabilities of 2000 md for sand and 20 md for mud were used.
Figure 11 shows one random realization from each method. The SGeMS and CNNPCA realizations display continuous channels connecting the producers and injectors. In the OPCA realization (Fig. 11
b), however, the channels are not as continuous, and I1 is not connected to P1. In the overall set of 200 realizations, the probability of producers being connected to injectors is higher for CNNPCA and SGeMS models than for OPCA models. This will be seen to have a strong impact on the flow statistics.
Twophase oilwater flow is considered in all of the simulations presented in this study. The relative permeability curves are shown in Fig. 12. Initial oil and water saturations are 0.9 and 0.1, respectively. Water viscosity is constant at 0.31 cp. Oil viscosity is 0.29 cp at the initial reservoir pressure of 325 bar. The two water injectors and the two producers operate at constant bottomhole pressures (BHPs) of 335 bar and 315 bar. The simulation time frame is taken to be 5000 days. This period is sufficiently long such that the producers that are connected to injectors through sand will experience water breakthrough. Note that the use of BHP control for all wells, rather than the specification of injection rates as in Vo and Durlofsky (2014), poses a more stringent test of the models. This is because the prediction quantities of interest here are phase production rates, and these rates can show large errors if channel connectivity is not maintained.
Figure 13 displays the P10, P50 and P90 responses for oil and water production rates obtained from the SGeMS, OPCA, and CNNPCA models. The P90 result for a particular quantity (and similarly for P10 and P50 results) at a particular time corresponds to the 90th percentile in the response at that time. At different times, the P90 response is associated with different models. The P50 results are shown in Fig. 13 as solid curves, while the P10 and P90 results are shown as dashed curves.
Figure 13a, c compares OPCA results (blue curves) to reference SGeMS results (red curves) for field oil and water rates. Significant differences are evident for this challenging case, and OPCA results are seen to consistently underpredict oil and water rates. These discrepancies are due to the lack of channel connectivity in many of the OPCA realizations, and the fact that the wells in these simulations are under BHP control.
CNNPCA flow responses, shown in Fig. 13b, d (black curves), are much closer to the reference SGeMS results. Although a slight overprediction in water rate is evident in Fig. 13d, these results are overall very accurate. With reference to the P50 water rate curves, it is interesting to note that the SGeMS results display two periods of sharper increase in rate. The first such period is at around 1000 days, when the initial breakthrough occurs, while the second is at around 3000 days, corresponding to breakthrough at the second well. This behavior is captured accurately by the CNNPCA models, indicating that the impact of channel connectivity is correctly captured. The fact that the CNNPCA P10 and P90 oil and water rate results also agree with the SGeMS results suggests that the variability in the ensemble of 200 models does indeed correspond to that in the SGeMS models. This is an important observation, since it is possible to generate models that are channelized but do not display the appropriate degree of variability (i.e., by specifying a large value for , as in Fig. 9c, d).
Figure 14
presents the cumulative distribution functions (CDFs) for cumulative oil and water produced over the entire simulation time frame (5000 days) for SGeMS, OPCA and CNNPCA models. In general CNNPCA and SGeMS results match closely, though some discrepancy is evident in the water CDFs. The OPCA models significantly underestimate cumulative production, and about 25% of the models do not predict any water production at the end of the simulation time frame (Fig.
14b). This is again due to the lack of channel connectivity in the OPCA realizations.We reiterate that this problem setup represents a much more difficult case than those considered in Vo and Durlofsky (2014) for several reasons. Specifically, (1) very little conditioning data are used here, (2) the permeability contrast is an order of magnitude larger in the current setup, and (3) all wells are under BHP control in this example. As noted earlier, OPCA performs well for less extreme cases. For example, with more hard data, channels are better resolved by the underlying PCA representation, leading to more accurate connectivity in OPCA models. In addition, under rate control, errors in phase flow rates are typically smaller (though BHP errors may then be significant). It is noteworthy, however, that CNNPCA performs well even for this more extreme case.
5 History Matching Using CNNPCA
Results in the previous section showed that random CNNPCA models, generated by sampling the lowerdimensional variable from the standard normal distribution, provide flow statistics in close agreement with those from SGeMS models. While SGeMS models are generated from a nonparametric algorithm, the CNNPCA models are associated with uncorrelated lowerdimensional variables that can be conveniently varied to generate models that match observed production data. The use of CNNPCA for history matching and posterior uncertainty quantification is now described.
5.1 History Matching Procedure
After applying a geological parameterization that maps model parameters to lowdimensional variables , the history matching process involves finding the values of that minimize the mismatch between simulation results and observed data. The randomized maximum likelihood (RML) method has been shown to be effective at providing reasonably accurate estimates of posterior uncertainty. In this approach, multiple posterior models are obtained by performing multiple minimizations with perturbed objective functions (Kitanidis, 1986; Oliver, 1996).
Vo and Durlofsky (2015) provided a detailed description of RMLbased subspace history matching for generating multiple posterior samples of (and thus ). The minimization problem is expressed as
(15) 
This objective function consists of two terms. The first term, referred to as the data mismatch, is a weighted square mismatch that quantifies the difference between simulated data and (perturbed) observation data . Evaluating involves mapping to the model parameter , which is then used to calculate reservoir properties. Simulated flow responses (e.g., well rates and BHPs) at selected time points are then extracted to form
. The measurement errors for the observed data are assumed to be independent and to follow a multivariate Gaussian distribution with zero mean and diagonal covariance matrix
. In RML, the observed data are perturbed with random noise sampled from .The second term, referred to as the model mismatch, is a regularization term that acts to constrain to be close to a random prior realization sampled from the standard normal distribution. Multiple minimizations are performed, each with different random realizations of and , to obtain multiple posterior models. Simulation results obtained from these posterior models are used to estimate the posterior uncertainty of the production forecast.
5.2 Problem Setup
The setup used for history matching is very similar to that considered for flow assessment in Sect. 4. The reservoir model is again a 2D channelized system defined on a grid. Here, however, the two injection wells are located around the center of the model, as shown in Fig. 15, and the simulation time frame is only 2000 days. Injection and production well BHPs are set to 335 bar and 315 bar, respectively. In a later stage, new production wells will be introduced in the upperright corner of the model. Predictions using both CNNPCA posterior models and OPCA posterior models, for both existing wells and new wells, will be presented.
The setup for constructing conditional OPCA and CNNPCA representations is the same as described previously. A total of conditional SGeMS realizations are generated to construct the PCA representation, with reduced dimension of . The SGeMS realizations are conditioned to hard data at the four wells, all of which are in sand. One SGeMS realization is randomly selected as the ‘true’ facies model (Fig. 15a). This realization is not included in the models used to construct the PCA basis. In constructing CNNPCA, random PCA realizations are used to train the model transform net. The weighting factors in Eq. 13 are and . A lower value may be required here than in Sect. 4 since the wells are closer (and thus the hard data tend to be honored more easily).
Parameters for the flow simulations are as described in Sect. 4. The historymatch parameters (components of ) provide , , from which gridblock properties are computed as and , with constants as given in Sect. 4. Observed data include water injection rate and oil and water production rates over the first 1000 days, recorded every 100 days. The total number of data points is . Observed data are generated using simulation results for the true model (Fig. 15
a), perturbed with independent random noise following a Gaussian distributions with zero mean. The standard deviation for the random noise is set to be 10% of the corresponding true data, with a minimum value of 2 m
day. A total of 30 posterior (history matched) models are generated to quantify uncertainty using the subspace RML procedure described in Eq. 15.A derivativefree optimization method is applied here to solve the minimization problem defined in Eq. 15. This type of approach is nonintrusive with respect to the simulator, so it can be readily used with any subsurface flow simulator. As discussed in Sect. 2.4, the CNNPCA representation is differentiable when the final hardthresholding step is replaced with a differentiable transformation. The algorithm can then be applied in conjunction with efficient adjointgradientbased methods, as was done with OPCA in Vo and Durlofsky (2015).
The derivativefree optimizer used here is the hybrid PSO–MADS algorithm developed by (Isebor et al., 2014)
. This approach combines particle swarm optimization (PSO), which is a global stochastic search algorithm, with mesh adaptive direct search (MADS), a local pattern search method. By alternating between the two methods, PSO–MADS provides some amount of global exploration along with convergence to a local minimum. This was found to be useful in field development optimization problems. Standalone PSO and standalone MADS have been shown to be effective in history matching with PCA
(Echeverría Ciaurri et al., 2009; Rwechungura et al., 2011) and OPCA (Liu, 2017) parameterizations. The application of PSO–MADS for history matching, which was found to be effective for the cases considered in this study, does not appear to have been reported previously.In the example considered here, the dimension of the search space () is 70. A PSO swarm size of 50 is used, which means that 50 flow simulations are required at each PSO iteration. Each MADS iteration involves evaluation of a set of stencilbased trial points located around the currentbest solution. This entails flow simulations for the MADS version applied here. Both PSO and MADS parallelize naturally, so if 140 compute nodes are available, the algorithm can be run fully parallelized. The initial guess for the minimization in Eq. 15 is set to be a random prior realization . PSO–MADS is run in all cases for 24 iterations, which was found to provide an appropriate level of reduction in the mismatch defined in Eq. 15. Detailed optimizer settings and treatments were not explored here since the intent is to evaluate the general performance of CNNPCA for history matching. Future work should assess both alternative PSO–MADS specifications as well as the use of adjointgradientbased minimization.
5.3 History Matching Results
Examples of prior and posterior (historymatched) OPCA and CNNPCA models are presented in Fig. 15, along with the true model. The posterior models correspond to the prior models, meaning the vector associated with the prior model is used as the initial guess in the optimization. It is evident that, in the true model (Fig. 15a), highpermeability channels connect injectorproducer pairs I1–P1 and I2–P2. These connectivities are not fully captured in the prior models (Fig. 15b, d), but both the OPCA and CNNPCA posterior models (Fig. 15c, e) resolve these aspects of the system.
Prior and posterior predictions for P1 water production rates.
a OPCA prior predictions, b OPCA posterior predictions, c CNNPCA prior predictions, d CNNPCA posterior predictions. Vertical dashed line at 1000 days indicates end of history matching period.Figure 16 displays water production results for producer P1. The vertical dashed line at 1000 days indicates the end of the history matching period. Prior OPCA and CNNPCA results are shown in Fig. 16a, c, and posterior results in Fig. 16b, d. It is evident that, even though the OPCA prior results suggest that channel connectivity is insufficient in many of the realizations (this inference derives from the fact that water rate is quite low in some cases), posterior predictions from OPCA are close to the true data. With CNNPCA, the prior predictions suggest better channel connectivity in the prior models. Posterior CNNPCA predictions, like those of OPCA, are close to the true data. Both methods result in much smaller spreads in posterior predictions compared to the prior results. Similar behaviors are observed for well P2. For this assessment it is evident that, although CNNPCA provides higherquality posterior models than OPCA in terms of geological realism, there is not much difference in posterior flow predictions.
A more challenging type of posterior prediction is now considered. In this assessment, the performance of posterior models in predicting production for new wells located away from existing wells is evaluated. Figure 17 displays the problem setup. The two new wells, P3 and P4, are in the upperright region of the model. Based on the true facies model, wells P3 and P4 are connected to I1 and I2 through sand channels. Both new wells begin production at the end of the history matching period (1000 days), with BHPs specified to be 315 bar. Again, the posterior models are simulated to 2000 days.
Production forecasting for new wells is highly uncertain as the prior models are not conditioned to hard data at the new well locations, and the production data observed in the first 1000 days are not sensitive to model properties in the vicinity of the new wells. Figure 18 shows OPCA and CNNPCA posterior models together with new wells P3 and P4. For both sets of posterior models, relatively large uncertainty is observed for facies type at the new well locations, and in the connectivity between the new wells and the existing injectors. The CNNPCA posterior models, however, display better channel connectivity than the OPCA models, so they would appear to be better suited to describe the uncertain range of performance associated with wells P3 and P4.
Water production rate predictions for new well P4 are shown in Fig. 19a, c. It is evident that only a relatively small fraction of OPCA posterior models (5 out of 30) predict water breakthrough during the simulation period. By contrast, more than half of the CNNPCA posterior models (17 out of 30), predict the relatively early water breakthrough that occurs for the true model. Analogous results are observed for P3 water production. Also of interest are predictions for field oil rate (Fig. 19b, d). In the true model, oil contained in the channels connecting I1 and P3, and I2 and P4, is produced when the new wells begin operating. Thus the true predictions entail a large increase in oil production rate at 1000 days. For OPCA, half of the posterior models predict little or no increase in oil rate at this time (Fig. 19b). In addition, portions of the true response lie outside the OPCA posterior predictions. In comparison, most of the CNNPCA posterior models (24 out of 30) predict a large increase in oil production at 1000 days, and the true results fall within the CNNPCA posterior predictions.
Although the results in Figs. 18 and 19 suggest qualitatively that CNNPCA is performing well for this problem, such a conclusion cannot be definitively made until CNNPCA results are compared to those from a rigorous posterior sampling strategy such as rejection sampling. In rejection sampling, a large number of prior SGeMS realizations are generated and simulated. Realizations are rejected or accepted based on the degree of mismatch between simulation results and observed data. Predictions from the accepted realizations provide an accurate assessment of posterior uncertainty. Rejection sampling is, however, extremely expensive and is thus not performed in this study.
6 CNNPCA for Bimodal Deltaic Fan Systems
In this section, the CNNPCA procedure is extended for application to a more complex bimodal deltaic fan system. The development is limited to the generation of realizations – no flow results will be presented. The model here corresponds to a nonstationary random field with global trends in channel direction and channel width. Thus, the spatial correlation structure of the deltaic fan model is more difficult to characterize compared to the channelized case. In addition, this system is bimodal rather than binary, meaning the properties within each of the facies are not constant but instead follow prescribed (Gaussian) distributions.
In this case, the model is defined on a grid and the components of represent logpermeability () in each grid block. The procedure for generating realizations for the deltaic fan system using SGeMS is described in Vo and Durlofsky (2016). The first step is to generate a binary facies model using the ‘snesim’ algorithm with local affinity transformation and rotation. Figure 20 shows the training image and local regions for affinity transformation and rotation. The training image, defined on a grid, represents a binary channelized system with the main channel orientation in the direction. The scaling factors for the four affinity transformation regions are 1.5, 1, 0.5 and 0.3. The rotation angles for the five rotation regions are 45, 30, 0, 30 and 45 degrees. In the next step, Gaussian realizations for
within each facies (with mean and variance of 8 and 0.8 in sand, and 3 and 0.4 in mud) are simulated independently using the ‘sgsim’ algorithm in SGeMS. The logpermeability for each block is then assigned according to the binary facies model using the cookiecutter approach.
A total of realizations, conditioned to hard data at 20 well locations, are generated. Figure 21 shows two SGeMS realizations and the corresponding histograms. The six wells toward the lowerleft and lowerright portions of the model are in mud, while the other 14 wells are in sand. A PCA representation, with , is constructed from the SGeMS realizations. The procedure for constructing the OPCA representation for bimodal systems is described in detail in Vo and Durlofsky (2015). The basic idea is to postprocess PCA models with the minimization
(16) 
where is a weighting factor and the regularization term is defined as
(17) 
This regularization leads to a histogram transformation to match a bimodal distribution, with the two modes at and and standard deviations and . The values for the parameters in the regularization term, together with the weighting factor , are determined through a preliminary optimization such that the average histograms evaluated over a set of OPCA realizations match closely with the target average histogram computed from the SGeMS realizations. In this case, the optimal values for the parameters thus obtained are , , , , and .
The first step to construct the CNNPCA models is to select a reference model . In this case, there is no deltaic fan training image that depicts the nonstationary spatial correlation structure. In addition, not all SGeMS realizations provide the expected geological connectivity. This is the case with the realization shown in Fig. 21c, where the main channels are ‘broken’ at the bottom. Therefore, one particular SGeMS realization (Fig. 21a), which displays the expected continuity, is taken as the reference model . Again, a total of random PCA realizations are used to train the model transform net. In this case, the weighting factors in Eq. 13 are and , both obtained through numerical experimentation.
Training for 3 epochs (2250 iterations) requires approximately 10 minutes for this case. This is significantly faster than the deeplearningbased geological parameterizations discussed in
Laloy et al. (2017a, b), where training takes around 3 to 5 hours for 2D binary facies models of similar size on one GPU (NVIDIA Tesla K40). The improved performance here is mainly due to the smaller number of epochs (3 epochs compared to over 50 epochs in Laloy et al. (2017a, b)) needed for the model transform net to converge. Faster convergence for the training here may be due to the fact that the CNNs here use the PCA model as input rather than random noise, as in Laloy et al. (2017a, b). The PCA model preserves the target spatial structure up to twopoint correlations, so transforming PCA models to the final postprocessed models is a simpler process than transforming from random noise. In addition, in the previous studies either an encoder network or a discriminator network needed to be trained to capture the multiplepoint correlations of the models, while in this study an offtheshelf network (the VGG network) is used directly to characterize the multipoint statistics.After appropriate training, it was found that there was still discrepancy between the histogram of the models and the target histogram from the SGeMS models. Therefore, a final histogram transformation step is performed by applying OPCA to postprocess . The final CNNPCA model for the bimodal system is (analogous to Eq. 16)
(18) 
where the regularization term is defined in Eq. 17. A separate preliminary optimization was performed to obtain the values for and the parameters in . These values were found to be , , , , and . Note that, using this OPCA postprocessing as the final step, the CNNPCA procedure is differentiable and can thus be used with gradientbased history matching.
Figure 22 displays a PCA realization and the corresponding OPCA and CNNPCA realizations, along with the histograms for the three models. The nonstationary features, including decreasing channel width with increasing and the fanshaped channel structure, are honored to varying extents in the three realizations. Both the OPCA and CNNPCA models have bimodal histograms, while the PCA model has an essentially Gaussian histogram. In terms of channel continuity, the CNNPCA realization appears more consistent with the reference SGeMS model (in Fig. 21) than the PCA and OPCA realizations, for both large channels near the bottom of the domain and smaller channels towards the top. In addition, the spurious channels present in the lowerleft and lowerright portions of the PCA and OPCA models do not appear in the CNNPCA model. This demonstrates that CNNPCA can be applied for the bimodal nonstationary deltaic fan system. Flow results and history matching for this and other complicated geological systems will be considered in future work.
7 Concluding Remarks
In this paper, a deeplearningbased approach, referred to as CNNPCA, was developed for the lowdimensional parameterization of geological models characterized by multipoint spatial statistics. CNNPCA represents a generalization of the OPCA method, as both procedures are based on the postprocessing of PCA models to match the style of a reference model. CNNPCA utilizes a set of metrics based on a pretrained convolutional neural network to characterize multiple point correlations, which enables the method to preserve the complex geological patterns present in the reference model (e.g., training image). A hard data loss term was introduced to generate conditional CNNPCA realizations. For the 2D systems considered in this study, CNNPCA is essentially a direct application of the fast neural style transfer algorithm, originally developed in computer vision for dealing with images. In CNNPCA, the PCA models are postprocessed quickly, by feeding them through a convolutional neural network called the model transform net. The computational cost of CNNPCA is mainly associated with the offline training of this model transform net. The training process is significantly faster with CNNPCA than with existing deeplearningbased geological parameterization techniques.
Model realizations generated using CNNPCA were presented for a 2D binary channelized system. Even in the absence of hard data, unconditional CNNPCA realizations provided a highlevel of channel continuity and uniform channel width, consistent with the reference training image. Unconditional OPCA realizations, by contrast, do not provide the same degree of geological realism. For conditional systems, CNNPCA models were shown to honor all hard data for the cases considered. Application of CNNPCA for a nonstationary bimodal deltaic fan system was also presented. For this system, OPCA was used as a final postprocessing step to ensure that the histogram of the final model matches the reference histogram. The CNNPCA realizations for the bimodal deltaic fan system were shown to preserve the global trends in channel direction and channel width.
The consistency between CNNPCA models and reference SGeMS models was demonstrated by evaluating flowresponse statistics for an oilwater twophase flow problem. P10, P50, P90 results and CDFs for injection and production quantities, obtained from random SGeMS, OPCA, and CNNPCA realizations, were compared. CNNPCA models were shown to provide flow statistics in close agreement with flow statistics obtained from reference SGeMS models for a challenging case involving a small amount of conditioning data (OPCA is not as well suited for cases that lack hard data). History matching for a conditional binary channelized system was also considered. A derivativefree optimization method, PSO–MADS, was applied for the required minimization, and multiple posterior models were generated using a subspace RML procedure. Both CNNPCA and OPCA provided significant uncertainty reduction for production forecasts involving existing wells, but CNNPCA was shown to provide more realistic results than OPCA for forecasts involving new wells.
In this study, CNNPCA was applied for 2D systems. In future work, extensions to treat 3D systems should be developed. For 3D cases, the direct use of the offtheshelf convolutional neural network (the VGG16 net), pretrained with 2D images, is no longer viable. However, the general approach of using convolutional neural networks to characterize complex (multipoint) 3D geological models should still be valid. In fact, when correlations are weak in the vertical direction, straightforward extensions of the methodology presented in this paper may be sufficient. Other areas for future research include the evaluation of different approaches for history matching with CNNPCA models. Additional minimization algorithms, such as adjointgradientbased methods, as well as ensemblebased techniques, should be assessed. Multilevel history matching treatments, along the lines of those presented in Liu (2017), should also be considered for use with CNNPCA. Such methods enable history matching to be accomplished more efficiently in stages, using sequential sets of PCA basis components, and were shown to be effective when applied with OPCA.
Acknowledgements.
We thank the industrial affiliates of the Stanford Smart Fields Consortium for financial support. We are grateful to Hai Vo for providing OPCA code and geological models, and to Hang Zhang and Abhishek Kadian for their open source PyTorch implementation of the neural style transfer and fast neural style transfer algorithms. We also acknowledge the teaching staff for the Stanford CS231N course for offering helpful guidance and providing computing resources on Google cloud.Appendix: Model Transform Net Architecture
The architecture of the model transform net is summarized in Table 1. It is the same as that in Johnson (2015)
except for the first and last layers. In the table, ‘Conv’ denotes a convolutional layer immediately followed by spatial batch normalization and a ReLU nonlinear activation. The last ‘Conv’ layer is an exception as it only contains a convolutional layer. ‘Residual block’ contains a stack of two convolutional layers, each with 128 filters of size
and stride 1. Within each residual block, the first convolutional layer is followed by a spatial batch normalization and a ReLU nonlinear activation. The second convolutional layer is followed only by a spatial batch normalization. The final output of the residual block is the sum of the input to the first convolutional layer and the output from the second convolutional layer.
Layer  Output size 

Input  (, , ) 
Conv, 32 filters of size , stride 1  (, , ) 
Conv, 64 filters of size , stride 2  (, , ) 
Conv, 128 filters of size , stride 2  (, , ) 
Residual block, 128 filters  (, , ) 
Residual block, 128 filters  (, , ) 
Residual block, 128 filters  (, , ) 
Residual block, 128 filters  (, , ) 
Residual block, 128 filters  (, , ) 
Conv, 64 filters of size , stride  (, , ) 
Conv, 32 filters of size , stride  (, , ) 
Conv, 1 filter of size , stride  (, , ) 
References
 Astrakova and Oliver (2015) Astrakova A, Oliver DS (2015) Conditioning truncated pluriGaussian models to facies observations in ensembleKalmanbased data assimilation. Mathematical Geosciences 47(47):345–367
 Canchumuni et al. (2017) Canchumuni SA, Emerick AA, Pacheco MA (2017) Integration of ensemble data assimilation and deep learning for history matching facies models. Paper OTC28015MS, presented at the OTC Brasil, Rio de Janeiro, Brazil, 2426 October.
 Chang et al. (2010) Chang H, Zhang D, Lu Z (2010) History matching of facies distribution with the EnKF and level set parameterization. Journal of Computational Physics 229(20):8011–8030
 Chen et al. (2016) Chen C, Gao G, Ramirez BA, Vink JC, Girardi AM (2016) Assisted history matching of channelized models by use of pluriprincipalcomponent analysis. SPE Journal 21(05):1793–1812

Deng et al. (2009)
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: A largescale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 248–255
 Dimitrakopoulos et al. (2010) Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) Highorder statistics of spatial random fields: exploring spatial cumulants for modeling complex nonGaussian and nonlinear phenomena. Mathematical Geosciences 42(1):65–99
 Echeverría Ciaurri et al. (2009) Echeverría Ciaurri D, Mukerji T, Santos ET (2009) Robust scheme for inversion of seismic and production data for reservoir facies modeling. Paper SEG20092432, presented at the SEG Annual Meeting, Houston, Texas, 2530 October.
 Emerick (2016) Emerick AA (2016) Investigation on principal component analysis parameterizations for history matching channelized facies models with ensemblebased data assimilation. Mathematical Geosciences 49(1):85–120
 Gatys et al. (2015) Gatys LA, Ecker AS, Bethge M (2015) Texture synthesis using convolutional neural networks. Advances in Neural Information Processing Systems pp 262–270
 Gatys et al. (2016) Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 2414–2423
 Goodfellow et al. (2016) Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press Cambridge
 HakimElahi and Jafarpour (2017) HakimElahi S, Jafarpour B (2017) A distance transform for continuous parameterization of discrete geologic facies for subsurface flow model calibration. Water Resources Research 53(10):8226–8249
 Insuasty et al. (2017) Insuasty E, Van den Hof PMJ, Weiland S, Jansen JD (2017) Lowdimensional tensor representations for the estimation of petrophysical reservoir parameters. Paper SPE182707MS presented at the SPE Reservoir Simulation Conference, Montgomery, Texas, 2022 February
 Isebor et al. (2014) Isebor OJ, Echeverría Ciaurri D, Durlofsky LJ (2014) Generalized fielddevelopment optimization with derivativefree procedures. SPE Journal 19(05):891–908
 Jafarpour et al. (2010) Jafarpour B, Goyal VK, McLaughlin DB, Freeman WT (2010) Compressed history matching: exploiting transformdomain sparsity for regularization of nonlinear dynamic data integration problems. Mathematical Geosciences 42(1):1–27
 Johnson (2015) Johnson J (2015) Neuralstyle. https://github.com/jcjohnson/neuralstyle

Johnson et al. (2016)
Johnson J, Alahi A, Li FF (2016) Perceptual losses for realtime style transfer and superresolution. European Conference on Computer Vision pp 694–711
 Kadian (2018) Kadian A (2018) Pytorch implementation of an algorithm for artistic style transfer. https://github.com/abhiskk/fastneuralstyle/commits/master
 Kingma and Ba (2014) Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 1412.6980
 Kitanidis (1986) Kitanidis PK (1986) Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resources Research 22(4):499–507
 Laloy et al. (2017a) Laloy E, Hérault R, Jacques D, Linde N (2017a) Efficient trainingimage based geostatistical simulation and inversion using a spatial generative adversarial neural network. arXiv preprint arXiv:170804975 1708.04975
 Laloy et al. (2017b) Laloy E, Hérault R, Lee J, Jacques D, Linde N (2017b) Inversion using a new lowdimensional representation of complex binary geological media based on a deep neural network. Advances in Water Resources 110:387–405
 Liu (2017) Liu Y (2017) Multilevel strategy for OPCAbased history matching using mesh adaptive direct search. Master’s thesis, Stanford University
 Lu and Horne (2000) Lu P, Horne RN (2000) A multiresolution approach to reservoir parameter estimation using wavelet analysis. Paper SPE62985MS, presented at the SPE Annual Technical Conference and Exhibition, Dallas, Texas, 14 October.
 Mosser et al. (2017) Mosser L, Dubrule O, Blunt MJ (2017) Reconstruction of threedimensional porous media using generative adversarial neural networks. Physical Review E 96(4):043,309
 Oliver (1996) Oliver DS (1996) Multiple realizations of the permeability field from well test data. SPE Journal 1(2):145–154
 Paszke et al. (2017) Paszke A, Gross S, Chintala S, Chanan G, Yang E (2017) Automatic differentiation in PyTorch. NIPS 2017 workshop

Ping and Zhang (2013)
Ping J, Zhang D (2013) History matching of fracture distributions by ensemble Kalman filter combined with vector based level set parameterization. Journal of Petroleum Science and Engineering 108:288–303
 Remy et al. (2009) Remy N, Boucher A, Wu J (2009) Applied Geostatistics with SGeMS: A User’s Guide. Cambridge University Press, Cambridge, UK
 Reynolds et al. (1996) Reynolds AC, He N, Chu L, Oliver DS (1996) Reparameterization techniques for generating reservoir descriptions conditioned to variograms and welltest pressure data. SPE Journal 1(4):413–426
 Rwechungura et al. (2011) Rwechungura R, Dadashpour M, Kleppe J (2011) Application of particle swarm optimization for parameter estimation integrating production and time lapse seismic data. Paper SPE146199MS, presented at the SPE Offshore Europe Conference and Exhibition, Aberdeen, UK, 68 September.
 Sarma et al. (2006) Sarma P, Durlofsky LJ, Aziz K, Chen WH (2006) Efficient realtime reservoir management using adjointbased optimal control and model updating. Computational Geosciences 10(1):3–36
 Sarma et al. (2008) Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Mathematical Geosciences 40(1):3–32
 Simonyan and Zisserman (2015) Simonyan K, Zisserman A (2015) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:14091556 pp 1–14, 1409.1556
 Strebelle (2002) Strebelle S (2002) Conditional simulation of complex geological structures using multiplepoint statistics. Mathematical Geology 34(1):1–21
 Tavakoli and Reynolds (2010) Tavakoli R, Reynolds AC (2010) History matching with parametrization based on the SVD of a dimensionless sensitivity matrix. SPE Journal 15(02):495–508
 Vo and Durlofsky (2014) Vo HX, Durlofsky LJ (2014) A new differentiable parameterization based on principal component analysis for the lowdimensional representation of complex geological models. Mathematical Geosciences 46(7):775–813
 Vo and Durlofsky (2015) Vo HX, Durlofsky LJ (2015) Data assimilation and uncertainty assessment for complex geological models using a new PCAbased parameterization. Computational Geosciences 19(4):747–767
 Vo and Durlofsky (2016) Vo HX, Durlofsky LJ (2016) Regularized kernel PCA for the efficient parameterization of complex geological models. Journal of Computational Physics 322:859–881
 Zhang et al. (2006) Zhang T, Switzer P, Journel A (2006) Filterbased classification of training image patterns for spatial simulation. Mathematical Geology 38(1):63–80
Comments
There are no comments yet.