The large scale nature of geological models makes reservoir simulations an expensive task, prompting numerous works that aim for a reduced representation of the geological properties that can preserve the heterogeneous characteristics required for accurate flow modeling. Traditional methods include zonation [jacquard1965, jahns1966]
and principal component analysis. More recent methods include enhanced PCA-based methods[sarma2008kernel, ma2011kernel, vo2016regularized], SVD methods [shirangi2016improved, tavakoli2011monte], discrete cosine transform [jafarpour2009, jafarpour2010], level set methods [moreno2007stochastic, dorn2008, chang20108011], and dictionary learning [khaninezhad2012sparse1, khaninezhad2012sparse2]
. Very recently, a new method from the machine learning community calledgenerative adversarial networks [goodfellow2014generative] has been investigated [mosser2017reconstruction, mosser2017stochastic, chan2017parametrization, laloy2017efficient, dupont2018generating, mosser2018conditioning] for the purpose of parametrization, reconstruction, and synthesis of geological properties; showing very promising results. This adds to the recent trend in applying machine learning techniques to leverage the increasing availability of data as well as rapid advances in the field [marccais2017prospective, kani2017dr, klie2015physics, stanev2018identification, sun2017new, chan2018machine, zhu2018bayesian, valera2017machine].
Generative adversarial networks is a novel technique for training a neural network to sample from a distribution that is unknown and intractable, by only using samples from this distribution. The result is a generator network that is capable of generating realizations from the target distribution –in our case, geological realizations– using a very reduced number of parameters. This is possible thanks to the high representational power of neural networks. In particular, the method has shown to preserve visual realism as well as flow statistics of the training data in experiments parametrizing geological properties.
Recent works [dupont2018generating, mosser2018conditioning] focused on the problem of post-hoc conditioning of the generator network: given a generator trained on unconditional realizations, the task is to generate realizations conditioned on new spatial observations (hard data). Current approaches are based on a recent inpainting technique introduced in [yeh2016semantic] that requires solving an optimization problem for each conditional realization, which can be expensive if several realizations are required, e.g. for history matching or uncertainty quantification. Moreover, the parametrization of the generation process is sacrificed.
In this work, we propose a method for obtaining a conditional generator to directly sample conditional realizations by coupling the unconditional generator with an inference network. For this, we first formulate the problem in the Bayesian framework as modeling the posterior distribution of the latent vector conditioned on observations. A comparison of this formulation to the recent inpainting technique in [yeh2016semantic]
is discussed. We then train an inference network to sample from the posterior distribution by minimizing the Kullback-Leibler divergence between the inference network’s distribution and this posterior. Finally, the conditional generator is obtained by coupling the inference network to the original generator, as illustrated inFigure 1. Sampling new conditional realizations can be done very efficiently and the parametrization of the generation process is maintained. The inference network is usually small and relatively easy to train, taking a few seconds in our experiments using a Nvidia GeForce GTX Titan X GPU. During deployment, the conditional generator can generate realizations of size at the rate of approximately 5500 realizations per second using this GPU.
The rest of this manuscript is organized as follows: In Section 2, we briefly describe generative adversarial networks and the Bayesian framework. In Section 3, we introduce a method to train an inference neural network to sample from the posterior distribution. In Section 4, we show results for geological realizations conditioned on several test cases. Finally, in Section 5 we discuss alternatives to the current work and possible directions.
We briefly describe generative adversarial networks (GAN) and the Bayesian framework for conditioning of geological realizations. Although not central to the method presented here, GAN was used to obtain the unconditional geomodel generator.
2.1 Generative adversarial networks
We represent the uncertain subsurface property of interest as a random vector where is very large (e.g. permeability discretized by the simulation grid). This random vector follows a distribution that is unknown and intractable (e.g. distribution of permeability with channels), and instead we are given a set of realizations of the random vector (e.g. a set of permeability models deemed representative of the area under study). Using this training set, the hope is to find a representation of in terms of a reduced number of free parameters. The approach taken here and in recent works is to consider a latent random vector with and where
is manually chosen to be easy to sample from (e.g. a multivariate normal or uniform distribution); and a deterministic neural network, called a generator, parametrized by weights to be determined. Given fixed, induces a distribution which is now unknown and possibly intractable (since is a neural network with many nonlinearities). On the other hand, sampling from this distribution is easy since it only requires sampling and forward-passing through . The goal is to optimize so that .
A difficulty in this problem is that both and are unknown and intractable. Nevertheless, sampling from these distributions is easy (for , one draws a batch of realizations from the training set, assuming the set is big enough). Following this observation, the seminal work in [goodfellow2014generative]
introduces the idea of using a classifier function, called a discriminator, to assess whether a generated realization “looks real”, i.e. is similar to realizations from the training set. The discriminator is also typically a neural network with weight parameters to be determined. The discriminator is trained to solve a binary classification problem, maximizing the following loss
which is in essence a binary classification score. The approximation is done by taking a batch of realizations from the training set for the first term, and sampling realizations from for the second term.
The generator on the other hand is trained to minimize the same loss, thus an adversarial game is created where and optimize the loss in opposite directions,
In practice, this optimization is performed alternately using gradient-based methods, where the gradients with respect to and are obtained using automatic differentiation algorithms. The equilibrium is reached when effectively learns to approximate and is in the support of (coin toss scenario). It is shown in [goodfellow2014generative] that in the limit, this process minimizes the Jensen-Shannon divergence between and .
Variations of GAN
Stability issues with the original formulation of GAN has led to numerous works to improve stability and generalize the method (e.g. see [radford2015unsupervised, salimans2016improved, arjovsky2017towards, arora2017generalization]
and references therein). One line of research generalizes GAN in the framework of integral probability metrics[muller1997integral]. Given two distributions and , and a set of real valued functions , an integral probability metric measures the discrepancy between and as follows,
Note the slight similarity with Equation 1. The choice of set is important and leads to several formulations of GAN. When is a ball in a Reproducing Kernel Hilbert Space, is the Maximun Mean Discrepancy (MMD GAN) [gretton2007kernel, dziugaite2015training]. When is a set of 1-Lipschitz functions, is the Wasserstein distance (WGAN) [arjovsky2017wasserstein, gulrajani2017improved]. When is a Lebesgue ball, we obtain Fisher GAN [mroueh2017fisher], and when is a Sobolev ball, we obtain Sobolev GAN [mroueh2017sobolev]. See [mroueh2017mcgan, mroueh2017sobolev] for an in-depth discussion. Our unconditional geomodel generator was trained using the Wasserstein formulation (see our related work in [chan2017parametrization]).
2.2 Conditioning on observations
Given a pre-trained generator , one possible use case is to obtain realizations conditioned on new spatial observations (hard data), that is, we need to find such that honors the observations. Let denote the observations and the values at the observed locations given . Under the probabilistic framework, the problem is to find
that maximizes its posterior probability given observations,
From Bayes’ rule and applying logarithms,
For the prior , a natural choice is
for which the generator has been trained. In most applications (and in ours), this is the multivariate standard normal distribution. For the likelihood, we take the general assumption of i.i.d. Gaussian measurement noise, where
is the measurement standard deviation. Then the optimization inEquation 5 can be written as
where we multiplied everything by . One way to draw different conditional realizations is to optimize Equation 8 using a local optimizer and different initial guesses for .
Comparison to GAN-based inpainting techniques
In image processing, image inpaiting is used to fill incomplete images or replace a subregion of an image (e.g. a face with eyes covered). The recent GAN-based inpainting technique by Yeh et al. [yeh2016semantic] and employed in [dupont2018generating, mosser2018conditioning] uses an optimization procedure with the following loss
The second term in this equation is referred as the perceptual loss and is the same second term in the GAN loss in Equation 1, which is the classification score on synthetic realizations. We can expect the perceptual loss to act as a regularization that drives towards a region of high density, or at least towards the support of , assuming that and have been trained to convergence, since then is at an optima for any realization of for . We should then expect the perceptual loss to have the same effect as the Bayesian prior . For example, let and . Then an optimal generator is and an optimal discriminator is for and otherwise. Then for , and otherwise, which is precisely the density function of scaled by . Nevertheless, the perceptual loss can be very useful in practice when and are not exactly optimal and there exist realizations of bad quality. In that case, the perceptual loss can help the optimization to find good quality solutions. In our work, we found the Bayesian prior to be sufficient while removing a layer of complexity in the optimization.
Finally, we also note that both L1 and L2 norms are explored in [yeh2016semantic] for the likelihood term, with L1 corresponding to the likelihood .
3 Conditional generator for geological realizations
using a local optimizer with different initial guesses. This approach, however, can be expensive and may not capture the full solution space. A better approach could be to use Markov chain Monte Carlo methods, given the latent vector is of moderate size, to better capture the full posterior distribution. Neither approach, however, maintains the parametrization of the sampling process.
We propose constructing a neural network that learns to sample from the posterior distribution. This inference network is yet another generator network that maps from realizations of a random vector with chosen (we naturally chose and ) to realizations of . Let be the distribution density induced by . This distribution is now unknown and intractable, but is easy to sample from since it only requires sampling and forward-passing through . The Kullback-Leibler divergence from to gives us
The first term is the expected loss under the induced distribution , with the loss defined in Equation 9. It can be approximated as
by sampling realizations from . The second term, however, is more difficult to evaluate since we lack the analytic expression of . The second term is also called the (negative) entropy of , usually denoted . On the other hand, it is easy to obtain realizations
. We therefore use a sample entropy estimator such as the Kozachenko-Leonenko estimator[kozachenko1987sample, goria2005new],
where is the distance between and its nearest neighbor. A good rule of thumb is as reported in [goria2005new]. Thus, the entropy estimator measures how spread the sample points are.
To train the inference network , we minimize , where both the estimator and the expected loss can be differentiated with respect to using automatic differentiation algorithms. Once the inference network is trained, the conditional generator is the new neural network , i.e. the composition of the unconditional generator and the inference network, as shown in Figure 1. Sampling conditional realizations can then done very efficiently by directly sampling and forward-passing through , and the parametrization of the generation process is maintained. We summarize the training steps of the inference network in Algorithm 1. Note that we show a simple gradient descent update (line 7), however it is more common to use dedicated update schemes for neural networks such as Adam [kingma2014adam]
Note that since is small in general, the inference network is also small and the network is easy to train relative to the generator. This also means that the relative increase in evaluation cost of the coupling is not significant. We find this to be the case in our experiments.
4 Numerical experiments
We first assess the method for simple test cases where the target distribution is a 1D or 2D mixture of Gaussians. We then present our main results for conditioning a generator previously trained to generate unconditional realizations of size
. All our numerical experiments are implemented using PyTorch111https://pytorch.org/ [paszke2017automatic], a python package for automatic differentiation. The source code of our implementation is available in our repository222https://github.com/chanshing/geocondition
. We use the same network architecture for the inference network (except input and output sizes) in all our test cases, consisting of a fully connected network with 3 hidden layers of size 512, and leaky ReLU activation. More details are described inAppendix A.
Mixture of Gaussians
We train a neural network to sample simple 1D and 2D mixture of Gaussians. Results are summarized in Figure 2, with in the 1D case, and in the 2D case. The source distribution is the standard normal in both cases.
The first example, Figure 1(a), is a mixture of three 1D Gaussians, with centers , and , and standard deviations , respectively; indicated with blue lines. The orange bars are the normalized histogram of 1000 sample points generated by the neural network. The second example, Figure 1(b), is a mixture of three 2D Gaussians, with centers , and , and covariances , , and , respectively. We plot the contour lines of the mixture of 2D Gaussians, and also scatter plot 4000 sample points generated by the inference network. In both test cases, we can see that the neural network effectively learns to transport points from the standard normal distribution to the mixture of Gaussians.
Conditional geological realizations
Our unconditional generator is a neural network previously trained using the method of generative adversarial networks to generate unconditional realizations of 2D channelized permeability of size . The input latent vector is of size with standard normal distribution. Details of the implementation is described in Appendix A and is similar to our related work in [chan2017parametrization]. Examples of unconditional realizations from the pre-trained generator is shown in Figure 3. Note that the conditioning method can be applied to any pre-trained generator network.
We formulate the conditional sampling problem in the Bayesian framework as described in Section 2.2, and train an inference network to sample the posterior . We assume in all our test cases. We use and (i.e. , so that if no conditioning were present, should learn the identity function).
We experiment with several conditioning test cases, conditioning on the presence of channel (high permeability) or background material (low permeability) at locations in the domain. We train an inference network for each test case and then generate conditional realizations using the coupled network
. Here we use the same hyperparameters to train the inference network in all test cases, although one could fine-tune the optimization for each test case to improve the results.
We show samples of the resulting conditional generator for two conditioning cases in Figure 4. We see that the generated realizations honor the conditioning points while maintaining the quality of the original generator. In Figure 3(b), we deliberately enforce a conditioning setting to obtain a specific channel passing through the domain, and see that the generator is capable of generating multiple realizations reproducing this enforced channel while providing enough variability in the rest of the domain. This could be useful in practice when we know the presence of specific structures in the area. Additional test cases are shown in Appendix B. Although not performed here, a straightforward improvement could be to adopt a safe margin by conditioning a neighborhood of the observed points.
In our experiments, the inference network takes a few seconds to train for each test case using a Nvidia GeForce GTX Titan X GPU. During deployment, can generate conditional realizations at the rate of about realizations per second. We did not find noticeable increase in computational time between and . In fact, the bottleneck in the GPU was due to memory operations.
5 Discussion and conclusion
We presented a method to address the conditioning of geological realizations without sacrificing the parametrization of the sampling process. The method is based on minimizing a Kullback-Leibler divergence to the posterior distribution of the latent vector, and involves a sample entropy estimation. The sample entropy estimator based on the nearest neighbor ( in Equation 17) was first applied in [ulyanov2017improved] to improve diversity in the context of neural style. In the same context, [li2017diversified] used a similar estimator but based on random neighbors. Finally, in the context of generative modeling, [kim2016deep]
used a closed-form expression of the entropy term when using batch normalization[ioffe2015batch]. The estimator used in this work is the generalization of the entropy estimator using nearest neighbors introduced in [goria2005new]. Other alternatives to train neural samplers include normalizing flow [rezende2015variational], autoregressive flow [kingma2016improved], and Stein discrepancy [wang2016learning]. These are all alternatives worth exploring in future work. Also related to our work include [nguyen2016plug, engel2017latent] where the authors optimize the latent space to condition on labels/classes.
Appendix A Implementation details
We use the same architecture for the inference network in all our experiments, namely, a fully connected network with 3 hidden layers of size , and component-wise leaky ReLU activation if , otherwise. More specifically, , where , and , , . The weights are optimized using the gradient descent scheme Adam with learning rate and default optimizer parameters (, see [kingma2014adam]). We use a batch size of sample points in approximating the expectations in the geological conditioning problem. For the mixture of Gaussian problems, we use a batch size of . In all test cases, the inference network converges in between and iterations.
Regarding the pre-trained generator, the implementation is similar to our recent work in [chan2017parametrization], but we train the generator using a set of realizations of size , obtained using the snesim algorithm [strebelle2001reservoir, remy2004sgems]
. The architecture is a fully convolutional neural network introduced in[radford2015unsupervised], that empirically shows good performance in generative adversarial networks. The generator is also trained with the Adam scheme and default parameters as mentioned above, and with a batch size of . The Wasserstein formulation of GAN is used, with the discriminator trained iterations per each iteration of the generator, and weight clipping to enforce the Lipschitz condition (see [arjovsky2017wasserstein]). Convergence of the generator is achieved in about generator iterations, taking approximately minutes using a Nvidia GeForce GTX Titan X GPU. During deployment, both the conditional and unconditional generators generate approximately realizations per second of size using the GPU (we do not find significant increase in compute time from to ).