1 Introduction
Generation and parametrization of geological models are fundamental tasks in subsurface reservoir management. Due to inherent difficulties in obtaining a complete and accurate description of the subsurface properties – such as the porosity and the permeability –, engineers must contemplate a set of possible realizations of the unknown and uncertain parameters while honoring the sparse information available (e.g. information from well logs). Standard algorithms used to generate geological realizations rely on the variogram and twopoint geostatistics, however these algorithms are not suitable for generating complex geological models that are often encountered in practice, such as those containing channels and faults. To address this limitation, a number of multipoint geostatistical simulators (MPS) were introduced [strebelle2001reservoir, strebelle2002conditional, caers2004multiple]. MPS methods infer properties from a conceptual model: Basically, a conceptual model is an image designed under expert knowledge that conveys expected patterns of the subsurface. Then MPS methods generate images that resemble the conceptual model while honoring any other information available.
One limitation of MPS algorithms is that they do not provide a useful parametrization of the output realizations. Concretely, for efficient reservoir management it is of interest to be able to characterize the geological realizations in terms of a few number of parameters. Moreover, it is often desirable that such parametrization be differentiable. This is of practical importance because the generated realizations are often coupled with other parts of the simulation pipeline, e.g. for history matching or uncertainty quantification tasks. In these procedures, the differentiability allows for efficient inversion procedures as well as gradientbased optimization techniques.
The most standard approach in parametrization is principal component analysis (PCA) (also called KarhunenLoeve expansion), performed using a limited set of realizations. However, PCA is based on Gaussian assumptions and has limited capacity to parametrize complex statistics such as those found in complex geological structures with channelized features. In the field of machine learning, where parametrization of high dimensional data is ubiquitous, kernel principal component analysis (kPCA)
[scholkopf1998nonlinear] was introduced as an extension of PCA. In essence, kPCA relies on an initial transformation of the input data to enable more general features, followed by standard PCA application on the transformed data. This is done indirectly through the “kernel trick” to avoid the evident computational implications of transformations in very high dimensions. The application of this machine learning tool in geostatistics was first studied in [sarma2008kernel], and further improved in [ma2011kernel, vo2016regularized]. Later, other PCAbased methods were proposed such as in [vo2014new, vo2015data] where PCA is formulated as an optimization problem, allowing the deliberate introduction of regularization terms for conditioning and imposing spatial statistics. This was then extended to kPCA methods in [vo2016regularized].In recent years, a novel parametrization method called generative adversarial networks (GAN) [goodfellow2014generative]
was introduced in the machine learning community. The formulation of GANs takes a distinctive approach based on game theory, improving upon previous generative models by avoiding intractable likelihood functions and Markov chains. The sampling from the resulting generator is performed simply using a forward function evaluation.
In this work, we study the application of Wasserstein GAN [arjovsky2017wasserstein]
(a particular implementation of GAN) for the parametrization of complex geological models. The main motivation in the study of GAN for the parametrization of complex geology comes from the wellknown capability of neural networks to learn very complex nonlinear features. In fact, in combination with convolutional networks
[lecun1989generalization, krizhevsky2012imagenet], GAN models have shown stateoftheart performances in computer vision, specifically in challenging tasks such as reproducing natural images (human faces, handwritten digits, etc. [radford2015unsupervised]). On a more related topic, GAN was recently employed in the reconstruction of porous media [mosser2017reconstruction]. All this suggests that GAN can be effective in the parametrization of very general geological models that exhibit complex structures.The rest of this paper is organized as follows: In section 2, we briefly describe parametrization of geological models using standard principal component analysis. In section 3, we describe generative adversarial networks and the Wasserstein formulation. In section 4, we compare the generated realizations and present several numerical results in subsurface flow problems. Finally, we draw our conclusions in section 5.
2 Parametrization of geological models
In a generic modeling setting, a spatial discretization grid is defined over the physical domain of interest. The underlying random field (e.g. porosity, permeability, etc.) then takes the form of a random vector
where is the value at grid block , and is the total number of grid blocks. The goal of parametrization is to obtain a functional relationship , where is a random vector with assumed prior distribution , of size much smaller than (i.e. ). Additionally, it is desirable that be differentiable. New realizations of the random vector can then be generated by sampling the random vector from the prior .The traditional approach in parameterizing is to perform a principal component analysis (PCA). Given a set of centered realizations (i.e. zero mean) , let be the matrix whose columns are the realization vectors. The covariance matrix of the random vector can be approximated by
(1) 
PCA assumes the following parametrization of :
(2)  
where
is the matrix whose columns are the eigenvectors of the covariance matrix
,is a diagonal matrix containing the respective eigenvalues
, and is the parameterizing random vector of uncorrelated components. Assumming , the dimensionality reduction is obtained by keeping only the first components, that is, where , .It can be easily verified that realizations generated by Equation 2 reproduce the covariance matrix
. However, these new realizations do not in general reproduce higher order moments of the original realizations.
3 Generative adversarial networks
Generative adversarial networks (GAN) [goodfellow2014generative] consist of two competing functions (typically neural networks), namely a generator and a discriminator . The generator maps from input random noise vector to output random vector representing the “synthetic” images. A prior distribution is assumed for
, usually the standard normal distribution or an uniform distribution. Given a dataset of original realizations
that are assumed to come from an unknown distribution , the goal for is to sample from this distribution – informally, to generate samples that resemble samples from the dataset. Given that is prefixed, induces a distribution that depends only on (the weights of) . The objective is to optimize so that . On the other hand, the discriminatoris a classifier function whose goal is to determine whether a realization
is real (i.e. coming from the dataset) or synthetic (i.e. coming from the generator). The output ofis a scalar representing the probability of
coming from the dataset.In short, the goal of is to correctly discern between synthetic and real samples, while the goal of is to fool . This interplay between and is formulated as a twoplayer minmax game:
(3) 
In practice, the solution to this game can be approached by iteratively optimizing and in an alternating manner and independently. Given infinite capacity, it is shown in [goodfellow2014generative] that the game minimizes the JensenShannon divergence between and , with global optimum when everywhere in its support. That is, the generator learns to replicate the data generating process, therefore the discriminator fails to distinguish real samples from synthetic ones, resulting in a “coin toss” situation. Once trained provides the achieved parametrization and the reduced representation. We note that the differentiability of with respect to is a direct consequence of choosing a family of differentiable functions for , which is normally the case when neural networks are employed.
3.1 Wasserstein GAN
Finding the equilibrium of the game defined by (3) is notoriously hard in practice. In particular, if the discriminator is “too well trained”, it becomes saturated and provides no useful information for improvement of . Another issue is mode collapse, where the generator generates only a single image that always accepts. Solving these issues has been the focus of many recent works, as summarized in [salimans2016improved, arjovsky2017towards].
One major development in improving GANs is Wasserstein GAN (WGAN) [arjovsky2017wasserstein]. In WGAN, the focus is in directly minimizing the Wasserstein distance instead of the JensenShannon divergence. The Wasserstein distance is defined as:
(4) 
where
denotes the set of all joint distributions
with marginals and , correspondingly. The Wasserstein distance as expressed in Equation 4 is intractable in practice, therefore the following expression, due to the KantorovichRubinstein duality, is used instead:(5) 
where indicates that the supremum is over the set of realvalued Lipschitz functions, for some (irrelevant) constant . In effect, the function in Section 3.1, also called the “critic”, replaces the discriminator in standard GAN. Instead of a classifier function outputing a probability, the critic outputs a general realvalued score to the samples.
Even though the theoretical derivation of WGAN is very different from standard GAN, the only difference in the resulting formulation is a small change in the objective function, plus the Lipschitz constraint in the search space of the adversarial function (the critic), although relaxing the requirement to be a classifier function. However, the practical implications of WGAN are significant: it is much more stable and it provides a meaningful loss metric during training that correlates with the quality of the generated samples. This is very important for assessing the training progress, as well as providing a criteria for convergence. For more detailed discussion, see [arjovsky2017wasserstein].
3.1.1 Training WGAN
Typically, both the critic function and the generator are neural networks with specified architectures. Let and represent the weights of the networks for and , respectively, and we use the symbols and to explicitly express this dependency. To enforce the Lipschitz condition of , it is enough to assume a compact search space for the weights . In practice, this can be done by clipping the values of within certain range after each update of the training iteration.
The training procedure consists of alternating steps where and are optimized independently: In step A, we fix the generator weights and optimize the critic network (ideally to optimality) following the maximization problem:
(6) 
Once the critic is fitted, we are provided with an approximation of the Wasserstein distance (up to a multiplicative constant). In step B, this distance is differentiated with respect to to obtain the gradient required to optimize :
(7) 
Steps A and B are iteratively repeated until convergence is reached.
The quality of the approximation of the Wasserstein distance, and therefore of its gradient, will depend of course on how well the critic has been trained. Hence, it is desirable to train the critic relatively well in each iteration. In practice, it is common to fully optimize every so often by setting a high number of inner iterations in step A after a number of full AB loops. This is in contrast to standard GAN where the training of the discriminator and the generator has to be carefully balanced. Moreover, the readily available approximated Wasserstein distance provides an useful metric that correlates well with the quality of generated samples. We note the lack of a similar a metric in the standard GAN formulation.
4 Numerical Experiments
In this section, we demonstrate the use of Wasserstein GAN ( referred simply as GAN in the rest of this section) for the parametrization of two permeability models. We assess the generated realizations in subsurface flow problems. An uncertainty propagation study is performed to estimate densities for several quantities of interest. Results are compared with the true results derived from the original realizations. We also include results based on standard PCA parametrization.
4.1 The dataset
We evaluate the effectiveness of GANs in parameterizing two permeability models. The first model is characterized by channelized structures showing semistraight patterns. The second model consists of channelized structures exhibiting heavy meandering patterns. Conceptual images^{3}^{3}3Conceptual images or conceptual models are also called training images in the geology literature. Unluckily, the same term is also employed in machine learning to refer to the input training images (the realizations). Hence, we do not employ such term to avoid confusion. for each permeability type are shown in Figures 0(b) and 0(a). These are binary images representing logpermeability values, where (black) designates the high permeability channels and (white) designates the low permeability background. In practice, these could be channels of sand embedded in a background of clay. The conceptual images used are of size .
We consider realizations of the permeability of size . Typically, these are obtained by passing the large conceptual image to a multipoint geostatistical simulator such as snesim [strebelle2001reservoir] which will generate a dataset of realizations. Instead, here we obtained our dataset by simply cropping out patches of size from the conceptual images. A dataset of “realizations” can be quickly obtained in this way from each model, although with significant overlap between samples. Some samples are shown in Figure 0(c).
4.2 The setup
For each pattern considered, a dataset of realizations is used to train a GAN model and to perform PCA for comparison. For the design of our GAN architecture, we followed the guidelines provided in [radford2015unsupervised]. We found that the inclusion of fully connected layers worked best in our experiments. More specifically, we included one fully connected layer after the input of , and one before the output of . The rest of the architecture design follows the suggestions from the guidelines. The network is trained with the Wasserstein formulation using the default settings provided in [arjovsky2017wasserstein]. For the size of the noise vector , we looked at sizes and , and we assumed standard normal distribution for the prior . We used as the output activation of the generator and we preprocessed the binary images before training by shifting and scaling, from the binary range to the range . Using as the output activation automatically bounds the output of the generator. After the generator is trained and new realizations are generated, we can simply transform the outputs back to the binary range^{4}^{4}4
Alternatively, one can use a sigmoid function, which is already bounded in
, as the output activation of . However, offers better properties during training, such as already providing zerocentered inputs for the discriminator..Regarding PCA, we followed the procedure in published literature [sarma2008kernel, ma2011kernel] of retaining of the total variance. This corresponded to retaining eigencomponents in the semistraight pattern, and eigencomponents in the meandering pattern. A screeplot of the eigenvalues is shown in Figure 2 for both permeability patterns.
4.3 Comparing realizations
We denote by GAN20 and GAN40 the GAN models trained with input of sizes and , respectively. Figures 4 and 3 show realizations generated with the GAN models and PCA, along with original realizations. These realizations are fair draws (not cherrypicked). It is evident from these images that the GAN models outperform PCA in capturing the channelized patterns of the data. In the semistraight case, PCA slightly manages to capture the horizontal correlation of the channels, but it is clear that the generated samples are not channelized. In the meandering case, PCA seems to completely fail in capturing any pattern of the data. Additional samples are displayed in Appendix A.
Visually, we did not see much difference in samples generated by GAN20 and GAN40. Overall, both models generated realizations that are highly plausible, with samples that may trick the human eye. There are, however, some features that expose them such as isolated pixels, although this could be removed by applying a median filter. Perhaps of more importance in regards to reservoir simulation are the “hanging channels”, i.e. channels with endpoints inside the domain, which are not present in the original realizations. In principle, this could be addressed by further tuning the network architecture.
Following the visual comparison, we plot the histogram of the permeability values at the center of the domain. For this, we generated a number of realizations using the GAN models and PCA for each of the permeability patterns. The results presented in Figure 5 show that both GAN20 and GAN40 learn the correct type of distribution (binary) with the vast majority of the values toward the extremes ( and ), and the obtained frequencies are very close to data (original realizations). In contrast, the results from PCA follow a normal distribution, as expected.
4.4 Uncertainty propagation study
In this subsection, we study the effectiveness of the parametrization for an uncertainty propagation task. Since no significant differences were found in the visual comparison between GAN20 and GAN40, we only consider the results for GAN20 to simplify the presentation. We denote the results of GAN20 simply as GAN in the rest of this subsection.
We consider a subsurface flow problem where water is injected in an oilfilled reservoir. The physical domain under consideration is the unit square discretized by a grid and with isotropic permeability field as prescribed by the generated realizations. To simplify the physical equations, we assume that both water and oil have the same fluid properties (singlephase flow). Under this assumption, the system of equations to be propagated is
(8)  
(9) 
where denotes the fluid pressure, denotes (total) fluid sources, and are the water and oil sources, respectively, denotes the permeability, denotes the saturation of water, denotes the porosity, and denotes the Darcy velocity. Note that where denotes the discretized version (recall that the values of are the logpermeability values).
The pressure equation 8 and the saturation equation 9 are coupled through the Darcy velocity . In practice, the pressure is first solved to obtain , then the saturation is solved iteratively in time.
We consider two flow problems with different injection and production conditions:
 Quarterfive spot problem:

In this problem, injection and production points are located at and of the unit square, respectively. Noflow boundary conditions are imposed on all sides of the square. We assume unit injection/production rates, i.e. and .
 Uniform flow problem:

In this problem, uniformly distributed inflow and outflow conditions are imposed on the left and right sides of the unit square, respectively. Noflow boundary conditions are imposed on the remaining top and bottom sides. A total inflow/outflow rate of is assumed. For the unit square, this means and on the left and right sides, respectively, where denotes the outwardpointing unit normal to the boundary.
In both conditions, a pressure value of is imposed at the center of the square to close the problem, the reservoir only contains oil at the beginning (i.e. ), and the porosity is homogeneous and constant, with value . The simulated time is from until . In reservoir engineering, it is common practice to express changes in terms of pore volume injected (PVI), defined as , where is the water injection rate and is the pore volume of the reservoir. For a constant injection rate, this is simply a scaling factor on the time.
For each flow problem and for each permeability pattern, Equations 9 and 8 were propagated for each of the three sets of realizations: one set corresponding to data ( randomly selected from the original realizations), and the other two sets corresponding to realizations generated by PCA and GAN.
After propagation, we computed the first four statistical moments (mean, variance, skewness and kurtosis) of the saturation at time
, based on the runs. The results are shown in Figures 9, 8, 7 and 6 for each test case. In the mean and variance maps, both PCA and GAN are very close to the true maps. The good performance of PCA is expected since this method aims to preserve the first two statistical moments. For the higher order moments (skewness and kurtosis), estimations by GAN are overall better than PCA, as seen from the map plots. These observations are general in all the test cases. To further compare the statistics of the saturation solutions, we plot the saturation histogram () at the point of maximum variance, shown in Figures 11 and 10. In accordance to the map plot results, we see that PCA fails to replicate the higher order characteristics of the saturation distribution. On the other hand, the distributions estimated from GAN are surprisingly close to the true distributions, suggesting that it has generated samples that preserve the original flow properties.




Next, we look at the watercut curves, which measure the water content in the produced fluid over time. We computed the mean and variance of the watercut curves from the runs The results are shown in Figures 13 and 12 for each test case. We observe that the mean curves are hardly distinguishable in all cases. Regarding the variance curves, two cases favored PCA (quarterfive spot in the semistraight pattern, and uniform flow in the meandering pattern) while the other two cases favored GAN. We emphasize again that PCA is expected to excel in the loworder statistics, yet the results of GAN are comparable.
Finally, we look at the histogram of the water breakthrough time (measured in PVI), defined as the time at which watercut reaches 1% of the produced fluids. Figures 15 and 14 show the histogram and estimated densities of the water breakthrough time for the quarterfive spot problem and the uniform flow problem, respectively. The estimated densities were obtained using Scott’s rule. In all cases, we found that the obtained densities from GAN are in good agreement with the reference densities, clearly outperforming PCA.
5 Conclusions
We presented one of the first applications of generative adversarial networks (GAN) for the parametrization of geological models. The results of our study show the potential of GANs as a parametrization tool to capture complex structures in geological models. The method generated geological realizations that were visually plausible, reproducing the structures seen in the reference models. More importantly, the flow statistics induced by the generated realizations were in close agreement with the reference. This was verified for two different permeability models for two test cases in subsurface flow. In particular, we performed uncertainty propagation to estimate distributions of several quantities of interest, and found that GAN was very effective in approximating the true densities even when the distributions were nontrivial. In contrast, PCA was not suitable for distributions that are far removed from the normal distribution. Furthermore, we note that GAN showed superior performance compared to PCA despite only employing coefficients, in contrast to PCA where and coefficients were used (retaining 75% of the variance) for the semistraight and meandering patterns, respectively.
We conclude that GANs present a promising method for the parametrization of geological models. In our future work, we will look into the practicality of GANs for inverse modeling where the differentiability of the generator could be exploited to build an efficient gradientbased inverse method.
Comments
There are no comments yet.