Abstract
History matching is a jargon used to refer to the data assimilation problem in oil and gas reservoirs. The literature about history matching is vast and despite the impressive number of methods proposed and the significant progresses reported in the last decade, conditioning reservoir models to dynamic data is still a challenging task. Ensemblebased methods are among the most successful and efficient techniques currently available for history matching. These methods are usually able to achieve reasonable data matches, especially if an iterative formulation is employed. However, they sometimes fail to preserve the geological realism of the model, which is particularly evident in reservoir with complex facies distributions. This occurs mainly because of the Gaussian assumptions inherent in these methods. This fact has encouraged an intense research activity to develop parameterizations for facies history matching. Despite the large number of publications, the development of robust parameterizations for facies remains an open problem.
Deep learning techniques have been delivering impressive results in a number of different areas and the first applications in data assimilation in geoscience have started to appear in literature. The present paper reports the current results of our investigations on the use of deep neural networks towards the construction of a continuous parameterization of facies which can be used for data assimilation with ensemble methods. Specifically, we use a convolutional variational autoencoder and the ensemble smoother with multiple data assimilation. We tested the parameterization in three synthetic historymatching problems with channelized facies. We focus on this type of facies because they are among the most challenging to preserve after the assimilation of data. The parameterization showed promising results outperforming previous methods and generating welldefined channelized facies. However, more research is still required before deploying these methods for operational use.
1 Introduction
Ensemblebased methods have been applied with remarkable success for data assimilation in geosciences. However, these methods employ Gaussian assumptions in their formulation, which make them better suited for covariancebased (twopoint statistics) models (Guardiano and Srivastava, 1993). This fact lead several researches to propose a variate of parameterizations to adapt these methods for models with nonGaussian priors, such as models generated with objectbased (Deutsch and Journel, 1998) and multiplepoint geostatistics (Mariethoz and Caers, 2014). Among these parameterizations, we can cite, for example, truncated plurigaussian simulation (Liu and Oliver, 2005; Agbalaka and Oliver, 2008; Sebacher et al., 2013; Zhao et al., 2008); levelset functions (Moreno et al., 2008; Chang et al., 2010; Moreno and Aanonsen, 2011; Lorentzen et al., 2012; Ping and Zhang, 2014); discrete cosine transform (Jafarpour and McLaughlin, 2008; Zhao et al., 2016; Jung et al., 2017); Wavelet transforms (Jafarpour, 2010)
; Ksingular value decomposition
(Sana et al., 2016; Kim et al., 2018); kernel principal component analysis (KPCA)
(Sarma et al., 2008; Sarma and Chen, 2009); PCA with thresholds defined to honor the prior cumulative density function (Chen et al., 2014, 2015; Gao et al., 2015; Honorio et al., 2015) and optimizationbased PCA (OPCA) (Vo and Durlofsky, 2014; Emerick, 2017). There are also works based on updating probability maps followed by resampling steps with geostatistical algorithms
(Tavakoli et al., 2014; Chang et al., 2015; Jafarpour and Khodabakhshi, 2011; Le et al., 2015; Sebacher et al., 2015). However, despite the significant number of works, the development of robust parameterizations for facies data assimilation remains an open problem. One clear indication that facies parameterization is an unsolved issue is the fact that the large majority of the publications consider only small 2D problems.Deep learning became the most popular research topic in machine learning with revolutionary results in areas such as computer vision, natural language processing, voice recognition and image captioning, just to mention a few. The success of deep learning in different areas has inspired applications in inverse modeling for geosciences. Despite the fact that the first investigations in this direction are very recent, the number of publications grew very fast in the last two years. For example,
Dubrule and Blunt (2017) used a generative adversarial network (GAN) (Goodfellow et al., 2014) to generate threedimensional images of porous media. Laloy et al. (2017) used a variational autoencoder (VAE) (Kingma and Welling, 2013)to construct a lowdimensional parameterization of binary facies models for data assimilation with Markov chain Monte Carlo. Later in
(Laloy et al., 2018), the same authors extended the original work using spatial GANs. Canchumuni et al. (2017) used an autoencoder to parameterize binary facies values in terms of continuous variables for history matching with an ensemble smoother. Later, Canchumuni et al. (2018)extended the same parameterization using deep belief networks (DBN)
(Hinton et al., 2006; Hinton and Salakhutdinov, 2006). Chan and Elsheikh (2017) used a Wasserstein GAN (Arjovsky et al., 2017) for generating binary channelized facies realizations. In (Chan and Elsheikh, 2018), the same authors coupled an inference network to a previously trained GAN to generate facies realizations conditioned to facies observations (hard data). Dupont et al. (2018)also addressed the problem of conditioning facies to hard data. They used a semantic inpainting with GAN
(Yeh et al., 2016). Liu et al. (2018) used the fast neural style transfer algorithm (Johnson et al., 2016) as a generalization of OPCA to generate conditional facies realizations using randomized maximum likelihood (Oliver et al., 1996).The present work is a continuation of the investigation reported in (Canchumuni et al., 2017, 2018) in the sense that it is also based on using an autoencodertype of network to construct a continuous parameterization for facies. However, the present work addresses the fact that our previous results were limited to small problems due to difficulties to train the neural networks and the fact that the resulting facies realizations did not preserve the desired geological realism. Here, we investigate the use of convolutional VAE (CVAE) to construct the parameterization. Note that Laloy et al. (2017) also used a CVAE to parameterize facies. Unlike Laloy et al. (2017), we consider the use of this parameterization in conjunct with an ensemble smoother for assimilation of hard data and dynamic (production) data. A similar approach was recently applied to parameterize seismic data for history matching with an ensemble smoother by (Liu and Grana, 2018).
The rest of the paper is organized as follows. In the next section, we briefly review generative models. In this section, we describe autoencoders, VAE and convolutional layers. After that, we describe the proposed parameterization for data assimilation applied to petroleum reservoirs using the method ensemble smoother with multiple data assimilation (ESMDA) (Emerick and Reynolds, 2013). Then, we present three test problems with increasing level of complexity followed by comments on potential issues in the parameterization. The last section of the paper summarizes the conclusions. All data and codes used in this paper are available for download at https://github.com/smith31t/GeoFacies_DL.
2 Generative Models
Generative models are machine learning methods designed to generate samples from complex (and often with unknown closed form) probability distributions in highdimensional spaces. These methods use unsupervised and semisupervised techniques to learn the structure of the input data so it can be used to generate new instances.
Let
denote a vector in the space
containing the facies values of a reservoir model and assume that each realizations ofare distributed according to some probability density function (PDF)
. Our goal is to construct a generative model that can create new random realizations of facies that are (hopefully) indistinguishable from samples of . For concreteness, consider a deterministic function, which receives as argument a random vector with known and easy to sample PDF . Here, we refer to as latent vector which belongs to a feature space . Moreover, let be parameterized by deterministic vector . Even though is a function with deterministic parameters, is a random vector in because is random. We want to replace by a deep neural network which can be trained (determine the parameters ) such that we can sample and generate samples which are likely to resemble samples from. There are several generative models described in the machine learning literature such as restricted Boltzmann machines, DBNs, GANs, VAEs among others. Here, we focus our attention to a specific model based on convolutional neural network
(LeCun, 1989) and VAE (Kingma and Welling, 2013). An overview about generative models in the context of deep learning methods is presented in (Goodfellow et al., 2016, Chap. 20). Before we introduce the proposed method, we briefly review the concepts of autoencoders, VAE and convolutional layers.2.1 Autoencoders
Autoencoder is an unsupervised neural network trained to learn complex data representations. The typical applications of autoencoders include data compression and noise removal. However, especially in the last decade, autoencoders become widely used as building blocks of deep generative models (Goodfellow et al., 2016). Figure 1 illustrates a standard deep autoencoder network composed by six fullyconnected layers. The first three layers (encoder) are responsible for mapping the input space to a feature space, . The last three layers (decoder) correspond to the inverse mapping
. The central layer is called code. The training process consists of minimizing a loss function that measures the dissimilarity between
and , for example, the mean square error. After training, the autoencoder is able to represent (encode) the most important features of in . When the decoder function is linear and the loss function is the mean square error, the autoencoder learns to span the same subspace of PCA (Goodfellow et al., 2016). Hence, autoecoders with nonlinear encoder and decoder functions may be interpreted as nonlinear generalizations of PCA (Deng et al., 2017).2.2 Variational Autoencoders
A VAE is similar to a standard autoencoder in the sense that it is composed by an encoder and a decoder network. However, unlike standard autoencoders, a VAE has an extra layer responsible for sampling the latent vector and an extra term in the loss function that forces to generate the latent vector with approximately a specified distribution, , usually assumed a standard Gaussian, . This extra term corresponds to the KullbackLiebler divergence which measures how closely the distribution of the encoded latent vectors is from the desired distribution , i.e.,
(1) 
where is the total loss function. is the reconstruction error. Here, we use the binary crossentropy function given by
(2) 
where assumes values 0 or 1 and assumes continuous values in . The term in Eq. 1 is a weight factor (for the test cases presented in this paper we use ).
is the Kullback–Leibler divergence from
to . This term can be interpreted as a regularization imposed in the feature space. However, the term in Eq. 1 has a more theoretical basis and it is derived from a variational Bayesian framework (Kingma and Welling, 2013). For the case where and the KullbackLeibler divergence becomes(3) 
where and are the
th components of the mean and standard deviation vectors. During training, instead of generating the latent vector
, the encoder generates vectors of means,, and logvariance,
. Then, the vector is drawn from and rescaled to generate the latent vector , which goes in the decoder to generate a reconstructed vector . Note that the minimization of the loss function imposes to be as close as possible to the input vector while the term pushes and towards the zero and the unity vectors, respectively. After training, the decoder can be used to generate new realizations by sampling . Conceptually, we are generating samples from a distribution , which is a Gaussian with mean given by a trained decoder with parameters and covariance equals to the identity multiplied by a scaling parameter (Doersch, 2016). Figure 2 shows a VAE illustrating the main components. The encoder corresponds to an inference network and the decoder corresponds to a generative model. A detailed discussion about the principles behind VAE is presented in (Doersch, 2016).2.3 Convolutional Layers
The neural networks illustrated in Figs. 1 and 2
are based on fullyconnected layers, i.e., each neuron is connected to all neurons in the previous layer. Unfortunately, fullyconnected networks do not scale well, i.e., the number of training parameters (weights and bias terms) increase dramatically when the size of the input space is large, which is the case of facies realizations where the number of gridblocks can be easily on the order of hundreds of thousands. This is one of the main limitations observed in our previous work with DBN
(Canchumuni et al., 2018). For this reason, in this work we resort to convolutional neural networks (LeCun, 1989)to construct the encoder and decoder of our VAE network. These networks gained significant attention in the deep learning area after the very successful application in the ImageNet image classification challenge
(Krizhevsky et al., 2012). Convolution neural networks are specialized in data with grid structure such as images and time series (Goodfellow et al., 2016). Usually each layer of a convolutional network consists of a sequence of convolutional operations, followed by the application of activation functions (detection stage) and pooling operations, which modify the size of the outputs for the next layer and reduce the number parameters and processing time in the next layers. The convolutional operations consist of a series of trainable filters (kernels) which are convolved with the input image to generate activation maps. These convolutions are essentially dot products between the entries of the kernel and the input at any position. Because the size of the kernels is much smaller than the dimension of the input data, the use of the convolutional layers reduces vastly the number of training parameters allowing deeper architectures. The activation functions are applied over the activation maps generated by the convolutional operations. The most common is the rectified linear units (ReLU) function. The pooling operation replaces the output by some statistic of the nearby outputs, typically the maximum output within a rectangular neighborhood (maxpooling). There are also hyperparameters which include the size and the number of kernels and the level of overlapping in the kernel (stride). For a detailed discussion about convolution networks we recommend
(Goodfellow et al., 2016, Chap. 9) and (Dumoulin and Visin, 2018).3 EsMdaCvae
Figure 3 illustrates the final CVAE architecture with convolutional and fullyconnected layers. We implemented the CVAE using Keras
(Chollet et al., 2015) with TensorFlow
(Abadi et al., 2015) as backend engine. This network is trained using a large number of prior facies realizations, on the order of realizations. Note that no reservoir simulations are required in this process. After training, the CVAE is conceptually equipped to generate new realizations by simply sampling the random vector and passing it to the decoder. At this point, the decoder works as a substitute model for the geostatistical algorithm used to construct initial realizations.
The data assimilation is done combining the trained decoder with the method ESMDA. Essentially, we use ESMDA to update an ensemble of realizations of the latent vector to account for reservoir data and use the decoder to reconstruct the corresponding facies models. Here, we refer to this procedure as ESMDACVAE. Figure 4 illustrates this workflow. The data assimilation stars with a set of prior realizations of the latent vector, denoted as in Fig. 4, where is the number of ensemble members. These prior latent vectors can be generated by sampling or being the result of the encoder for a set of prior facies realizations generated with geostatistics, which is the option adopted in the cases presented in this paper. The ensemble of latent vectors is used in the decoder to generate an ensemble of facies which goes in the reservoir simulator to compute an ensemble of predicted data . The ESMDA updating equation is used to update and the process continue until the number of data assimilation iterations is achieved. Because process requires reservoir simulations to computed the vectors of predicted data, which can be very time consuming depending on the size of the model, we limite on the order of realizations.
The resulting ESMDA updating equation can be written as
(4) 
where and are matrices containing the crosscovariances between and predicted data and autocovariances of
, respectively. Both matrices are estimated using the current ensemble.
is the dataerror covariance matrix. is the vector containing the observations and is a random vector sampled from , where is the datainflation factor. In a standard implementation of ESMDA, Eq. 4 is applied a predefined number of times, and the values of should be selected such that (Emerick and Reynolds, 2013). Here, we wrote Eq. 4 in terms of only the latent vector for simplicity. However, we can easily introduce more uncertainty parameters of the reservoir in the data assimilation by updating an augmented vector.4 Test Cases
4.1 Test Case 1
The first test case corresponds to the same case used in (Canchumuni et al., 2018). This is a channelized facies model generated using the algorithm snesim
(Strebelle, 2002). Figure 5 shows the reference (true) permeability field. The model has two facies: channels with constant permeability of 5,000 mD and background with permeability of 500 mD. The size of the model is gridblocks, all gridblocks with and constant thickness of 50 ft.
4.1.1 CVAE architecture and training
The training set consists of 24,000 facies realizations generated using snesim
with the same training image of the reference model. We also use 6,000 additional realizations for validation. The architecture of the network is described in Table 1 in the Appendix. The source code is available at https://github.com/smith31t/GeoFacies_DL. The input data of the CVAE are preprocessed facies images where each facies type corresponds to an color channel with the value one at the corresponding facies. This process is analogous to the preprocessing applied to color pictures where the image is divided in three color channels (red, green and blue). Essentially the encoder is composed of three convolutional layers followed by three fullyconnected layers and one dropout layer (Srivastava et al., 2014) to avoid overfitting. In the initial steps of the research, we tested different setups of the network, especially the dimension of the feature space. Because the encoder uses fullyconnected layers to compute the latent vector, it is desirable to keep the size of this vector, , as small as possible to reduce the computational requirements for training. Unfortunately, fullyconnected layers are not efficiently parallelizable even using GPU. Our limited set of tests indicated that for the problems presented in this paper, we did not observe significant improvements for . Hence, we selected . The decoder has a mirrored architecture of the encoder with transposedconvolutional layers (often referred to as deconvolutional layers (Dumoulin and Visin, 2018)
). Before the last layer of the decoder, we introduced an upsampling layer with bilinear interpolation to resize the output for the same size of the final model. Note that only the last layer has sigmoid activation function, which is used for classification of the facies type in each gridblock of the model.
The training required approximately 13 minutes in a cluster with four GPUs (NVIDIA TESLA P100) with 3584 cuda cores each. The final reconstruction accuracy for the validation set was 96.7%. Figure 6 shows the first five realizations of the validation set before and after reconstruction. The results in this figure show that the designed CVAE was able to successfully reconstruct the facies. This figure also shows the corresponding histograms of the latent vectors showing nearly Gaussian marginal distributions.
4.1.2 Conditioning to facies data
The facies realizations of the training and validation sets were generated without any hard data (facies type at well locations). However, in reallife applications, geological models are always constructed constrained to hard data. Our tests indicate that if we train the network with realizations conditioned to hard data, most of the reconstructed facies honor these data, but there is no guarantee. In fact, Laloy et al. (2017) reported that in one of their tests only 68% of the realizations honor all nine hard data points imposed in the training set. For this reason, here we investigate the ability of the proposed ESMDACVAE to condition the prior realizations to facies data. For this test, we used an ensemble of prior realizations and MDA iterations. We assumed a small value for the dataerror variance of . Figure 7 shows the first 20 prior realizations. Figures 9 and 11 show the corresponding realizations conditioned to seven (Fig. 8) and 20 (Fig. 10) hard data points, respectively. The results in these figures show that ESMDACVAE was able to honor the facies type for all data points. The posterior realizations show welldefined channels, although we observe some “broken” channels. Nevertheless, the final realizations preserve reasonably well the main geological characteristics of the prior ones.
4.1.3 Conditioning to production data
We tested the proposed ESMDACVAE to assimilate production data. We considered four oil producing and three water injection wells as shown in Fig. 5. All producing wells operate at constant bottomhole pressure of 3,000 psi. The water injection wells operate at 4,000 psi. The synthetic measurements correspond to oil and water rate data corrupted with Gaussian random noise with standard deviation of 5% of the data predicted by the reference model. We use a prior ensemble with realizations and iterations. We did not include any facies data (hard data) in order to make the problem more challenging for assimilation of production data. Figure 12 shows the first five prior and posterior realizations obtained with ESMDACVAE. Clearly all posterior realizations are able to reproduce the main features of the reference model (Fig. 5). Figure 13 shows the observed and predicted water rate for four wells showing a good data match. In (Canchumuni et al., 2018), we used the same problem to test the standard ESMDA and parameterizations with OPCA and DBN. Figure 14 shows the first realization obtained with each method. The results in this figure clearly show the superior performance of ESMDACVAE.
4.2 Test Case 2
The second test case is the same used in (Emerick, 2017). Figure 15 shows the reference permeability field and the corresponding histogram. The model has gridblocks with uniform size of 75 meters and constant thickness of 20 meters. Similarly to the first test case, this case has two facies (channel and background sand) generated with the snesim
algorithm. However, in this case we update the facies type and the permeability within each facies simultaneously. The permeability values within each facies were obtained with sequential Gaussian simulation. More details about the construction of this problem can be found in (Emerick, 2017).
For this test case, we used a CVAE network with architecture similar to the previous case, with few changes only to accommodate the fact that the size of the models are different. We used a training set with 32,000 realizations and 8,000 for validation. The training required 42 minutes in a cluster with four GPUs (NVIDIA TESLA P100). The final reconstruction accuracy for the validation set was 93.3%. Figure 16 shows the first five realizations of the validation set before and after reconstruction and the corresponding histograms of the latent vectors. Again the CVAE was able to achieve a reasonable reconstruction of the channels.
For this test problem, we assimilated water cut data at five oil producing wells and water rate at two water injection wells. The position of the wells is indicated in Fig. 15. The synthetic measurements were corrupted with Gaussian noise with standard deviation of 5% to the data predicted by the reference model. We use and . Our tests showed that for this problem we needed more MDA iterations than usual, possibly because the parameterization makes the problem more nonlinear. All prior realizations do not include facies data at well locations. During the data assimilation, we update the latent vectors and the permeability values within each facies. Figure 17 shows the first five prior and posterior realizations indicating that ESMDACVAE was able to generate plausible facies distributions, i.e., facies with similar features of the prior ones. Figure 18 shows the water cut data for four wells indicating reasonable data matches. Figure 19 shows the first realization obtained with ESMDA, ESMDAOPCA, ESMDADBN and ESMDACVAE. The first three results were extracted from (Canchumuni et al., 2018). This figure shows that the standard ESMDA was no able to preserve welldefined boundaries for the channels. ESMDAOPCA and ESMDADBN resulted in better models, however with some discontinuous branches of channels which are not present in the prior models. Again, ESMDACVAE obtained a realization with better representation of the channels.
4.3 Test Case 3
The last test case is a 3D model with fluvial channels generated with objectbased simulation. The model has three facies: channel, levee and background sand. Figure 20 shows the permeability and the facies distribution of the reference case. We applied a transparency to the background sand in Fig. 20b to allow the visualization of the geometry of the channels. We assumed a constant permeability for each facies: 2,000 mD in the channels, 1,000 mD in the levees and 100 mD in the background. This model has 100 100 gridblocks, all gridblocks with 50 m 50 m 2 m. This reservoir produces with six wells placed near the borders of the model and operated by a constant bottomhole pressure of 10,000 kPa. There are also two water injection wells placed at the center of the model operating with a fixed bottomhole pressure of 50,000 kPa.
The 3D geometry of the channel makes this problem particulary challenging because standard convolutional layers are designed for 2D images. One possible approach is to consider each layer of the reservoir model separately. This is the approach used in (Laloy et al., 2017). However, this procedure do not account for the geometry of the facies in the vertical direction as the convolutional operations are performed in 2D. Instead, we used the 3D convolutional layers available in TensorFlow
. Even though the extension of convolutional operations to three dimension is conceptually simple, its training becomes computationally challenging. In fact, Geoffrey Hinton described the used of 3D convolutional networks as a “nightmare” (Hinton et al., 2012). The architecture of the network is described Table 2
in the Appendix. In this network, we introduced batch normalization layers
(Ioffe and Szegedy, 2018) to improve stability and reduce training times. This procedure removed the need of the dropout layers used in the previous networks. We considered a training set of 40,000 realization and 10,000 for validation. The training took 49 hours in a cluster with four GPUs (NVIDIA TESLA P100) and the reconstruction accuracy was 89.1%.We applied ESMDACVAE with an ensemble of realizations and MDA iterations with constant inflation factors. The observations corresponded to oil and water rate predicted by the reference case and corrupted with random noise of 5%. Figures 21 and 22 show four realizations of the prior and posterior ensembles, respectively. Overall, the ESMDACVAE was able to preserve several channels with the desired characteristics. Figure 23
shows all ten layers of the reference model and the first realization before and after data assimilation. This figure shows that the posterior realization present some facies with welldefined channellevee sequences. However, the posterior model is clearly distinguishable from the prior and reference models with some discontinuous channels and some oddlyshaped facies; see, e.g., the bottom layer of the posterior realization (Fig.
23c). Ideally, we would like the posterior realizations to be visually indistinguishable from prior realizations generated with the objectbased algorithm. Nevertheless, these results are very encouraging and far superior to what would be obtained with standard ESMDA or even with a OPCA parameterization (this case is no computationally feasible with our previous DBN implementation). In fact, it is important mentioning that this type of model is extremely difficult to history match with the current methods available. Figure 24 shows the oil rate at four wells indicating significant improvements in the predictions, although there are still some realizations with poor data matches; for example, there are some models predicting zero oil rates.5 Comments
One important limitation of CVAE parameterization is the fact that we cannot apply distancebased localization (Houtekamer and Mitchell, 2001) to update because this vector is in a different space. Hence, it does not make sense to compute the Euclidian distance between a component of
and the spatial position of a well. Yet, localization is important to mitigate the negative effects of sampling errors and limited degrees of freedom in ensemble data assimilation. In
(Canchumuni et al., 2018), we tried to work around this issue by using the number of neurons in the code layer equals to the number of reservoir gridblocks. However, this procedure does not ensure the existence of a direct relation between the entry of and the corresponding spatial location of the gridblock in the reservoir model. In fact, because the convolutional layers share parameters, it is conceivable that each component of may be associated with the reconstruction of the facies in different regions in the reservoir (the same weights of the convolutional kernels are applied to multiple locations of the input data). Moreover, using the size of the code layer equals to the size of reservoir grid increases significantly the number of training parameters because of the fullyconnected layers. In practice, this may make the application unfeasible for larger reservoir models. There are localization procedures which are not formulated in terms of spatial distances that could be applied in this case; see, e.g., (Lacerda et al., 2018) and references therein. Unfortunately, in our experience, these procedures are less effective than distancedependent approaches. We did not use any type of localization in any of the test cases described in this paper. Nevertheless, this is definitely an issue that needs further investigation.Another practical problem for the application of the method investigated in this paper is the need for a large number of prior realizations to train the CVAE. In the tests cases considered in this paper, we used values between 30,000 and 50,000 realizations. However, in practice, we may need larger numbers for more complex models. Unfortunately, generating several realizations of the geological model with standard geostatistical algorithms may be very challenging. One possible solution we intend to investigate in the future is to use data augmentation (Yaeger et al., 1997; Taylor and Nitschke, 2017)
and transfer learning techniques
(HooChang et al., 2018; Cheng and Malhi, 2017). Data augmentation consist of a series of affine transformations applied to the input data to increase the training set. Typical augmentation strategies include mirroring, cropping and rotating images. Transfer learning is a strategy to use previously trained networks either as initialization or fixed parts of the implemented network. For example, in a preliminary test we applied the parameters of the network trained for the first test case as an initialization for the network in the second test case. This process resulted in a reduction of 50% in the training time. Finally, it is necessary to investigate procedures to reduce the computational requirements for training the networks. Note that our last test case has 100,000 gridblocks, which is relatively small compared to the size of the models employed operationally. Yet, the training required approximately two days in a cluster with four GPUs.6 Conclusions
In this paper, we investigated the use of a CVAE to parameterize facies in geological models and used ESMDA to condition these models to observed data. We tested the procedure in three synthetic reservoir historymatching problems with channelized features and increasing level of complexity. The first two test problems corresponded 2D cases. The proposed procedure outperformed previous results obtained with standard ESMDA, ESMDA with OPCA and DBN parameterizations. The third test problem considered 3D channels and three facies. This case required the use of 3D convolutional layers in the network increasing significantly the training time. There is also a noticeable decrease in the reconstruction accuracy for this case and the conditional realizations exhibit some features not present in the prior geological description of the model. Nevertheless, the overall performance of the method is very encouraging and indicates that the use of deeplearningbased parameterizations is a research direction worth pursuing. In the continuation of this research, we intend to use our trained CVAEs as the generative models in GANs. The objective is to improve the reconstruction accuracy, especially for the third test case.
Acknowledgement
The authors thank Petrobras for the financial support.
References
 Abadi et al. (2015) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. URL www.tensorflow.org.
 Agbalaka and Oliver (2008) Agbalaka, C. C. and Oliver, D. S. Application of the EnKF and localization to automatic history matching of facies distribution and production data. Mathematical Geosciences, 40(4):353–374, 2008. doi: 10.1007/s1100400891557.
 Arjovsky et al. (2017) Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. arXiv:1701.07875v3 [stat.ML], 2017. URL https://arxiv.org/abs/1701.07875.
 Canchumuni et al. (2017) Canchumuni, S. A., Emerick, A. A., and Pacheco, M. A. Integration of ensemble data assimilation and deep learning for history matching facies models. In Proceedings of the Offshore Technology Conference, Rio de Janeiro, Brazil, 24–26 October, number OTC28015MS, 2017. doi: 10.4043/28015MS.
 Canchumuni et al. (2018) Canchumuni, S. A., Emerick, A. A., and Pacheco, M. A. History matching channelized facies models using ensemble smoother with a deep learning parameterization. In Proceedings of the 16th European Conference on the Mathematics of Oil Recovery (ECMOR XVI), Barcelona, Spain, 3–6 September, 2018. doi: 10.3997/22144609.201802277.
 Chan and Elsheikh (2017) Chan, S. and Elsheikh, A. H. Parametrization and generation of geological models with generative adversarial networks. arXiv:1708.01810v1 [stat.ML], 2017. URL https://arxiv.org/abs/1708.01810.
 Chan and Elsheikh (2018) Chan, S. and Elsheikh, A. H. Parametric generation of conditional geological realizations using generative neural networks. arXiv:1807.05207v1 [stat.ML], 2018. URL https://arxiv.org/abs/1807.05207.
 Chang et al. (2010) Chang, H., Zhang, D., and Lu, Z. History matching of facies distributions with the EnKF and level set parameterization. Journal of Computational Physics, 229:8011–8030, 2010. doi: 10.1016/j.jcp.2010.07.005.
 Chang et al. (2015) Chang, Y., Stordal, A. S., and Valestran, R. Facies parameterization and estimation for complex reservoirs – the Brugge field. In Proceedings of the SPE Bergen One Day Seminar, Bergen, Norway, 22 April, number SPE173872MS, 2015. doi: 10.2118/173872MS.
 Chen et al. (2014) Chen, C., Gao, G., Honorio, J., Gelderblom, P., Jimenez, E., and Jaakkola, T. Integration of principalcomponentanalysis and streamline information for the history matching of channelized reservoirs. In Proceedings of the SPE Annual Technical Conference and Exhibition, Amsterdam, The Netherlands, 27–29 October, number SPE170636MS, 2014. doi: 10.2118/170636MS.
 Chen et al. (2015) Chen, C., Gao, G., Ramirez, B. A., Vink, J. C., and Girardi, A. M. Assisted history matching of channelized models using pluriprincipal component analysis. In Proceedings of the SPE Reservoir Simulation Symposium, Houston, Texas, USA, 23–25 February, number SPE173192MS, 2015. doi: 10.2118/173192MS.
 Cheng and Malhi (2017) Cheng, P. and Malhi, H. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. Journal of Digital Imaging, 30(2):234–243, 2017. doi: 10.1007/s1027801699292.
 Chollet et al. (2015) Chollet, F. et al. Keras, 2015. URL https://keras.io.
 Deng et al. (2017) Deng, X., Tian, X., Chen, S., and Harris, C. J. Deep learning based nonlinear principal component analysis for industrial process fault detection. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), 2017. doi: 10.1109/IJCNN.2017.7965994.
 Deutsch and Journel (1998) Deutsch, C. V. and Journel, A. G. GSLIB: Geostatistical software library and user’s guide. Oxford University Press, New York, 2nd edition, 1998.
 Doersch (2016) Doersch, C. Tutorial on variational autoencoders. arXiv:1606.05908v2 [stat.ML], 2016. URL https://arxiv.org/abs/1606.05908.
 Dubrule and Blunt (2017) Dubrule, L. M. O. and Blunt, M. J. Reconstruction of threedimensional porous media using generative adversarial neural networks. Physical Review E, 96(043309):1–16, 2017. doi: 10.1103/PhysRevE.96.043309.
 Dumoulin and Visin (2018) Dumoulin, V. and Visin, F. A guide to convolution arithmetic for deep learning. arXiv:1603.07285v2 [stat.ML], 2018. URL https://arxiv.org/abs/1603.07285.
 Dupont et al. (2018) Dupont, E., Zhang, T., Tilke, P., Liang, L., and Bailey, W. Generating realistic geology conditioned on physical measurements with generative adversarial networks. arXiv:1802.03065v3 [stat.ML], 2018. URL https://arxiv.org/abs/1802.03065.
 Emerick (2017) Emerick, A. A. Investigation on principal component analysis parameterizations for history matching channelized facies models with ensemblebased data assimilation. Mathematical Geosciences, 49(1):85–120, 2017. doi: 10.1007/s1100401696595.
 Emerick and Reynolds (2013) Emerick, A. A. and Reynolds, A. C. Ensemble smoother with multiple data assimilation. Computers & Geosciences, 55:3–15, 2013. doi: 10.1016/j.cageo.2012.03.011.
 Gao et al. (2015) Gao, G., Vink, J. C., Chen, C., Alpak, F. O., and Du, K. Enhanced reparameterization and dataintegration algorithms for robust and efficient history matching of geologically complex reservoirs. In SPE Annual Technical Conference and Exhibition, Houston, Texas, USA, 28–30 September, number SPE175039MS, 2015. doi: 10.2118/175039MS.
 Goodfellow et al. (2014) Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/5423generativeadversarialnets.pdf.
 Goodfellow et al. (2016) Goodfellow, I., Bengio, Y., and Courville, A. Deep learning. MIT Press, 2016. URL http://www.deeplearningbook.org/.

Guardiano and Srivastava (1993)
Guardiano, F. B. and Srivastava, R. M.
Multivariate geostatistics: Beyond bivariate moments.
In Soares, A., editor, Geostatistics Tróia ’92, volume 5 of Quantitative Geology and Geostatistics, pages 133–144. Springer Netherlands, 1993.  Hinton et al. (2012) Hinton, G., Krizhevsky, A., Jaitly, N., Tieleman, T., and Tang, Y. Does the brain do inverse graphics? Presentation at Brain and Cognitive Sciences Fall Colloquium, 2012. URL http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10754.pdf.
 Hinton and Salakhutdinov (2006) Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313:504–507, 2006. doi: 10.1126/science.1127647.
 Hinton et al. (2006) Hinton, G. E., Osindero, S., and Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. doi: 10.1162/neco.2006.18.7.1527.
 Honorio et al. (2015) Honorio, J., Chen, C., Gao, G., Du, K., and Jaakkola, T. Integration of PCA with a novel machine learning method for reparameterization and assisted history matching geologically complex reservoirs. In Proceedings of the SPE Annual Technical Conference and Exhibition, Houston, Texas, USA, 28–30 September, number SPE175038MS, 2015. doi: 10.2118/175038MS.
 HooChang et al. (2018) HooChang, S., Roth, H., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., and Summers, R. Deep convolutional neural networks for computeraided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging, 35(5):1285–1298, 2018. doi: 10.1109/TMI.2016.2528162.

Houtekamer and Mitchell (2001)
Houtekamer, P. L. and Mitchell, H. L.
A sequential ensemble Kalman filter for atmospheric data assimilation.
Monthly Weather Review, 129(1):123–137, 2001. doi: 10.1175/15200493(2001)129¡0123:ASEKFF¿2.0.CO;2.  Ioffe and Szegedy (2018) Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167v3 [cs.LG], 2018. URL https://arxiv.org/abs/1502.03167.
 Jafarpour (2010) Jafarpour, B. Wavelet reconstruction of geologic facies from nonlinear dynamic flow measurements. IEEE Transactions on Geoscience and Remote Sensing, 49(5):1520–1535, 2010. doi: 10.1109/TGRS.2010.2089464.
 Jafarpour and Khodabakhshi (2011) Jafarpour, B. and Khodabakhshi, M. A probability conditioning method (PCM) for nonlinear flow data integration into multipoint statistical facies simulation. Mathematical Geosciences, 43(2):133–164, 2011. doi: 10.1007/s110040119316y.
 Jafarpour and McLaughlin (2008) Jafarpour, B. and McLaughlin, D. B. History matching with an ensemble Kalman filter and discrete cosine parameterization. Computational Geosciences, 12(2):227–244, 2008. doi: 10.1007/s1059600890803.

Johnson et al. (2016)
Johnson, J., Alahi, A., and FeiFei, L.
Perceptual losses for realtime style transfer and superresolution.
In Proceedings of the European Conference on Computer Vision, 2016. doi: 10.1007/9783319464756˙43.  Jung et al. (2017) Jung, H., Jo, H., Kim, S., Lee, K., and Choe, J. Recursive update of channel information for reliable history matching of channel reservoirs using EnKF with DCT. Journal of Petroleum Science and Engineering, 154:19–37, 2017. doi: 10.1016/j.petrol.2017.04.016.
 Kim et al. (2018) Kim, S., Min, B., Lee, K., and Jeong, H. Integration of an iterative update of sparse geologic dictionaries with ESMDA for history matching of channelized reservoirs. Geofluids, 2018. doi: 10.1155/2018/1532868.
 Kingma and Welling (2013) Kingma, D. P. and Welling, M. Autoencoding variational Bayes. arXiv:1312.6114 [stat.ML], 2013. URL https://arxiv.org/abs/1312.6114.
 Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. doi: 10.1145/3065386.
 Lacerda et al. (2018) Lacerda, J. M., Emerick, A. A., and Pires, A. P. Methods to mitigate loss of variance due to sampling errors in ensemble data assimilation with nonlocal model parameters. Journal of Petroleum Science and Engineering, In Press, 2018. doi: 10.1016/j.petrol.2018.08.056.
 Laloy et al. (2017) Laloy, E., Hérault, R., Jacques, D., and Linde, N. Inversion using a new lowdimensional representation of complex binary geological media based on a deep neural network. Advances in Water Resources, 110:387–405, 2017. doi: 10.1016/j.advwatres.2017.09.029.
 Laloy et al. (2018) Laloy, E., Hérault, R., Jacques, D., and Linde, N. Trainingimage based geostatistical inversion using a spatial generative adversarial neural network. Water Resources Research, 54(1):381–406, 2018. doi: 10.1002/2017WR022148.
 Le et al. (2015) Le, D. H., Younis, R., and Reynolds, A. C. A history matching procedure for nonGaussian facies based on ESMDA. In Proceedings of the SPE Reservoir Simulation Symposium, Houston, Texas, USA, 23–25 February, number SPE173233MS, 2015. doi: 10.2118/173233MS.
 LeCun (1989) LeCun, Y. Generalization and network design strategies. Technical report, University of Toronto, 1989. URL http://yann.lecun.com/exdb/publis/pdf/lecun89.pdf.
 Liu and Grana (2018) Liu, M. and Grana, D. Ensemblebased seismic history matching with data reparameterization using convolutional autoencoder. In Proceedings of the SEG International Exposition and 88th Annual Meeting, 2018. doi: 10.1190/segam20182997988.1.
 Liu and Oliver (2005) Liu, N. and Oliver, D. S. Ensemble Kalman filter for automatic history matching of geologic facies. Journal of Petroleum Science and Engineering, 47(3–4):147–161, 2005. doi: 10.1016/j.petrol.2005.03.006.
 Liu et al. (2018) Liu, Y., Sun, W., and Durlofsky, L. J. A deeplearningbased geological parameterization for history matching complex models. arXiv:1807.02716v1 [stat.ML], 2018. URL https://arxiv.org/abs/1807.02716.
 Lorentzen et al. (2012) Lorentzen, R. J., Flornes, K., and Nævdal, G. History channelized reservoirs using the ensemble Kalman filter. SPE Journal, 17(1):137–151, 2012. doi: 10.2118/143188PA.
 Mariethoz and Caers (2014) Mariethoz, G. and Caers, J. Multiplepoint geostatistics – Stochastic modeling with training images. John Wiley & Sons, Ltd., 2014.
 Moreno and Aanonsen (2011) Moreno, D. L. and Aanonsen, S. I. Continuous facies updating using the ensemble Kalman filter and the level set method. Mathematical Geosciences, 43(8):951–970, 2011. doi: 10.1007/s1100401193474.
 Moreno et al. (2008) Moreno, D. L., Aanonsen, S. I., Evensen, G., and Skjervheim, J.A. Channel facies estimation based on Gaussian perturbations in the EnKF. In Proceedings of the 11th European Conference on the Mathematics of Oil Recovery (ECMOR XI), 2008. doi: 10.3997/22144609.20146403.
 Oliver et al. (1996) Oliver, D. S., He, N., and Reynolds, A. C. Conditioning permeability fields to pressure data. In Proceedings of the 5th European Conference on the Mathematics of Oil Recovery (ECMOR V), 03 September, 1996. doi: 10.3997/22144609.201406884.
 Ping and Zhang (2014) Ping, J. and Zhang, D. History matching of channelized reservoirs with vectorbased levelset parameterization. SPE Journal, 19(3):514–529, 2014. doi: 10.2118/169898PA.
 Sana et al. (2016) Sana, F., Katterbauer, K., AlNaffouri, T. Y., and Hoteit, I. Orthogonal matching pursuit for enhanced recovery of sparse geological structures with the ensemble Kalman filter. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(4):1710–1724, 2016. doi: 10.1109/JSTARS.2016.2518119.
 Sarma and Chen (2009) Sarma, P. and Chen, W. H. Generalization of the ensemble Kalman filter using kernel for non Gaussian random fields. In Proceedings of the SPE Reservoir Simulation Symposium, The Woodlands, Texas, 24 February, number SPE119177MS, 2009. doi: 10.2118/119177MS.
 Sarma et al. (2008) Sarma, P., Durlofsky, L. J., and Aziz, K. Kernel principal component analysis for efficient differentiable parameterization of multipoint geostatistics. Mathematical Geosciences, 40(1):3–32, 2008. doi: 10.1007/s1100400791317.
 Sebacher et al. (2013) Sebacher, B. M., Hanea, R., and Heemink, A. A probabilistic parametrization for geological uncertainty estimation using the ensemble Kalman filter (EnKF). Computational Geosciences, 17(5):813–832, 2013. doi: 10.1007/s105960139357z.
 Sebacher et al. (2015) Sebacher, B. M., Stordal, A. S., and Hanea, R. Bridging multipoint statistics and truncated Gaussian fields for improved estimation of channelized reservoirs with ensemble methods. Computational Geosciences, 19(2):341–369, 2015. doi: 10.1007/s1059601494663.
 Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
 Strebelle (2002) Strebelle, S. Conditional simulation of complex geological structures using multiplepoint statistics. Mathematical Geology, 34(1):1–21, 2002. doi: 10.1023/A:1014009426274.
 Tavakoli et al. (2014) Tavakoli, R., Srinivasan, S., and Wheeler, M. F. Rapid updating of stochastic models by use of an ensemblefilter approach. SPE Journal, 19(3):500–513, 2014. doi: 10.2118/163673PA.
 Taylor and Nitschke (2017) Taylor, L. and Nitschke, G. Improving deep learning using generic data augmentation. arXiv:1708.06020v1 [cs.LG], 2017. URL https://arxiv.org/abs/1708.06020.
 Vo and Durlofsky (2014) Vo, H. X. and Durlofsky, L. J. A new differentiable parameterization based on principal component analysis for the lowdimensional representation of complex geological models. Mathematical Geosciences, 46(7):775–813, 2014. doi: 10.1007/s1100401495412.

Yaeger et al. (1997)
Yaeger, L. S., Lyon, R. F., and Webb, B. J.
Effective training of a neural network character classifier for word recognition.
In Mozer, M. C., Jordan, M. I., and Petsche, T., editors, Advances in Neural Information Processing Systems 9, pages 807–816. MIT Press, 1997. URL http://papers.nips.cc/paper/1250effectivetrainingofaneuralnetworkcharacterclassifierforwordrecognition.pdf.  Yeh et al. (2016) Yeh, R. A., Chen, C., Lim, T. Y., Schwing, A. G., HasegawaJohnson, M., and Do, M. N. Semantic image inpainting with deep generative models. arXiv:1607.07539v3 [cs.CV], 2016. URL https://arxiv.org/abs/1607.07539.
 Zhao et al. (2008) Zhao, Y., Reynolds, A. C., and Li, G. Generating facies maps by assimilating production data and seismic data with the ensemble Kalman filter. In Proceedings of the SPE Improved Oil Recovery Symposium, Tulsa, Oklahoma, 20–23 April, number SPE113990MS, 2008. doi: 10.2118/113990MS.
 Zhao et al. (2016) Zhao, Y., Forouzanfar, F., and Reynolds, A. C. History matching of multifacies channelized reservoirs using ESMDA with common basis DCT. Computational Geosciences, 21(5–6):1343–1364, 2016. doi: 10.1007/s1059601696041.
Appendix: Architecture of the networks
Layer  Configuration  Comment 
Encoder  
Input  Shape = (45, 45, 2)  Two facies 
2D convolution 1  Kernels = 32, size = (2, 2), stride = (2, 2), activation = ReLU  – 
2D convolution 2  Kernels = 32, size = (3, 3), stride = (2, 2), activation = ReLU  – 
2D convolution 3  Kernels = 16, size = (3, 3), stride = (1, 1), activation = ReLU  – 
Flatten  –  Setup for the fullyconnected layer 
Fullyconnected 1  Neurons = 1024, activation = ReLU  – 
Dropout  10%  Strategy to avoid overfitting 
Fullyconnected 2  Neurons = 100, activation = linear  Mean of the VAE 
Fullyconnected 3  Neurons = 100, activation = linear  Logvariance of the VAE 
Code  
Lambda  –  Sampling 
Decoder  
Fullyconnected 4  Neurons = 1024, activation = ReLU  – 
Dropout  10%  Strategy to avoid overfitting 
Fullyconnected 5  Neurons = 2034, activation = ReLU  – 
Reshape  Output size = (12, 12, 16)  Setup for the transpose convolution 
2D transposed convolution 1  Kernels = 16, size = (3, 3), stride = (1, 1), activation = ReLU  – 
2D transposed convolution 2  Kernels = 32, size = (3, 3), stride = (2, 2), activation = ReLU  – 
2D transposed convolution 3  Kernels = 32, size = (2, 2), stride = (1, 2), activation = ReLU  – 
Bilinear upsampling  Output size = (45, 45, 32)  Resize output dimension 
2D convolution 4  Kernels = 2, size = (3, 3), stride = (1, 1), activation = sigmoid  Output image 
Layer  Configuration  Comment 
Encoder  
Input  Shape = (100, 100, 10, 3)  Three facies 
3D convolution 1  Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU  – 
Maxpooling 1  Pool = (2, 2, 1)  Dimension reduction 
3D convolution 2  Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU  – 
3D convolution 3  Kernels = 32, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU  – 
3D convolution 4  Kernels = 16, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU  – 
Maxpooling 2  Pool = (2, 2, 1)  Dimension reduction 
Flatten  –  Setup for the fullyconnected layer 
Fullyconnected 1  Neurons = 2000, activation = linear  – 
Batch normalization  –  Regularization 
Activation  ReLU  – 
Fullyconnected 2  Neurons = 100, activation = linear  Mean of the VAE 
Fullyconnected 3  Neurons = 100, activation = linear  Logvariance of the VAE 
Code  
Lambda  –  Sampling 
Decoder  
Fullyconnected 4  Neurons = 2000, activation = linear  – 
Batch normalization  –  Regularization 
Activation  ReLU  – 
Fullyconnected 5  Neurons = 5000, activation = ReLU  – 
Reshape  Output size = (25, 25, 5, 16)  Setup for the transposed convolution 
3D transposed convolution 1  Kernels = 16, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU  – 
3D transposed convolution 2  Kernels = 32, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU  – 
3D transposed convolution 3  Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU  – 
3D transposed convolution 4  Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU  – 
3D transposed convolution 5  Kernels = 3, size = (3, 3, 3), stride = (1, 1, 2), activation = sigmoid  Output 