Deep Neural Networks Predicting Oil Movement in a Development Unit

by   Pavel Temirchev, et al.

We present a novel technique for assessing the dynamics of multiphase fluid flow in the oil reservoir. We demonstrate an efficient workflow for handling the 3D reservoir simulation data in a way which is orders of magnitude faster than the conventional routine. The workflow (we call it "Metamodel") is based on a projection of the system dynamics into a latent variable space, using Variational Autoencoder model, where Recurrent Neural Network predicts the dynamics. We show that being trained on multiple results of the conventional reservoir modelling, the Metamodel does not compromise the accuracy of the reservoir dynamics reconstruction in a significant way. It allows forecasting not only the flow rates from the wells, but also the dynamics of pressure and fluid saturations within the reservoir. The results open a new perspective in the optimization of oilfield development as the scenario screening could be accelerated sufficiently.



There are no comments yet.


page 6

page 10


End-to-end neural network approach to 3D reservoir simulation and adaptation

Reservoir simulation and adaptation (also known as history matching) are...

A Deep Learning-Accelerated Data Assimilation and Forecasting Workflow for Commercial-Scale Geologic Carbon Storage

Fast assimilation of monitoring data to update forecasts of pressure bui...

Auto-Encoded Reservoir Computing for Turbulence Learning

We present an Auto-Encoded Reservoir-Computing (AE-RC) approach to learn...

Oil reservoir recovery factor assessment using Bayesian networks based on advanced approaches to analogues clustering

The work focuses on the modelling and imputation of oil and gas reservoi...

Computational Intelligence for Deepwater Reservoir Depositional Environments Interpretation

Predicting oil recovery efficiency of a deepwater reservoir is a challen...

A Bayesian approach to deconvolution in well test analysis

In petroleum well test analysis, deconvolution is used to obtain informa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 List of Abbreviations

FDHS Finite-Difference Hydrodynamical Simulator
ROM Reduced Order Modelling
CRM Capacitance-Resistance Model
PDE Partial Differential Equation
GRU Gated Recurrent Unit
(n)MDP (non-)Markovian Decision Process
ML Machine Learning
PCA Principal Components Analysis
SVD Singular Value Decomposition
VAE (Variational) Autoencoder
ELBO Evidence Lower Bound Objective

Leaky Rectified Linear Unit

LR Linear Regression
FCNN Fully-Connected Neural Network
RNN Recurrent Neural Network
NN Neural Network
DDF Data-Driven Forecasting

2 Introduction

Computer 3D modelling of oil flows through a porous medium is the most frequently used tool for an oil field development optimization and prediction of unknown reservoir properties by history-matching procedure history-matching-1990 . It is essential to have an accurate and fast enough model of oil flows to solve both problems.

The common modelling approach is to use a finite-difference hydrodynamical simulation (FDHS) ertekin-bars-2001 . FDHS uses computations on a discrete computational grid, and its accuracy is strongly dependent on a resolution of the grid. Being accurate, computations on fine grids may consume weeks to be performed for a big oil field. It is almost impossible to run iterative optimisation and history-matching methods using this kind of models in sufficient time.

There are many techniques aimed to speed up the simulation process: starting from simple downscaling and local grid refinement and ending with data-driven Reduced Order Modelling (ROM) approaches. Capacitance-Resistance Models (CRM) connectivitycrm-2005 ; kovalcrm-2015 ; fullycoupledcrm-2014

, for example, tries to fit nonlinear autoregressive model able to forecast fluid production rates. While CRM retains the physicality of the process, other models train autoregressive production model in a fully data-driven manner 

insim-2015 ; rpm-2016 ; shm-2layer-2015 . On the other hand, Galerkin projection methods are aimed to find the reduced approximation of Partial Differential Equations (PDE) solved by FDHS drrnn-2018 .

Petroleum society requires a fast and accurate model for three-phase flow simulation on a 3-dimensional computational grid. And all the discussed approaches cannot solve the task properly: either they are not able to produce solutions for spatial fields (pore pressure, saturation, etc.) or they can work only under nonrealistic greatly simplified settings.

In this work, we provided a methodology for a fast data-driven 3D reservoir modelling, based on machine learning techniques and aimed to approximate FDHS solutions on a subset of possible simulation options. We use “Metamodel” terminology GTApprox2016 to describe a purely data-driven 3D reservoir model approximating the solution given by a base model (e.g. FDHS), and we provide a detailed definition in Section 3.

Proposed metamodelling approach is based on the idea of forecasting in latent variable space. The same idea was used in e2c-2015 ; rce-2017

to approximate the dynamics of an environment with high-dimensional state space for model-based Reinforcement Learning. Authors propose to use Variational Autoencoders 

vae-2013 for the model order reduction and we try it in our framework.

Our major contributions are:

  • The metamodelling framework based on forecasting in latent variable space.

  • Metamodel of a field development unit composed of Convolutional Variational Autoencoder for dimensionality reduction and GRU Recurrent Neural Network gru-2014 for the latent dynamics prediction. The metamodel is able to predict both pressure-saturation distributions (Section 5) and production rates from wells (Section 6).

  • Comparative analysis of different metamodels in the sense of FDHS approximation accuracy, see Section 8.

3 Metamodelling of Reservoir Flows

We mentioned that our approach is developed to work in a subset of possible simulation options. The overall operational envelope of the presented model has two major restrictions:

  1. Geometry of the computational domain is restricted to a field development unit,

  2. Reservoirs are homogeneous in horizontal dimensions.

(a) Staggered line drive
(b) Direct line drive
(c) Without injection
Figure 1: Investigated well allocation schemes with circles used to denote injection wells and triangles to denote production wells. Dashed lines segregate 1/4-th of development units.

It may be noticed that if wells are allocated on a regular grid and the reservoir is homogeneous in horizontal dimensions, then the dynamics will be the same for all elements of the grid. In petroleum engineering, such an element is usually referred to as a development unit. We investigate three common well allocation schemes (fig. 1):

  • Staggered line drive (fig. (a)a),

  • Direct line drive (fig. (b)b),

  • Without injection (fig. (c)c).

A model that forecasts dynamics for a development unit (actually for -th of it) may be directly applied for symmetric reservoirs. Though real reservoirs frequently are neither strictly symmetric nor horizontally homogeneous, they may be discretised into development units of required form. The ensembling of units produced by such a discretisation was left for the future work. Throughout the rest of the paper, we consider forecasting dynamics and production rates for 1/4-th of a development unit with one production well and, possibly, one injection well.

We will think of a reservoir dynamics as a non-Markovian Decision Process (nMDP) with a time discretization step . The nMPD is defined using 5-tuple , where:

  • is a set of reservoir states representing distributions of pore volume pressure and oil, water and gas saturations

    within 3D coordinate grid at a fixed moment of time

    . Since we may drop one of the saturations from the reservoir state and compute it explicitly when needed (e.g. gas saturation).

  • is a set of control variables representing production bottomhole pressure and water injection rate (if injection is presented).

  • is a set of variables defining reservoir initial conditions, rock and fluids properties. This variable is invariant of time and is fixed for a concrete development unit. We refer to these variables as a metadata of a development unit.

  • is a transition function that maps applied control sequence , all previous states and metadata of development unit into a following reservoir state , such that

  • is a production rates function, such that , where

    is a vector of oil, water and gas production rates at time

    . Production rates may be thought as a kind of analogy to reward in classical MPDs.

There are models able to restore the transition function , and production rates function by solving corresponding partial differential equations (PDEs). We call them Base Models. Commonly, Base Models solve PDEs using finite-difference or finite-volume schemes on a discretised grid and differ from each other in used PDE’s modifications and numerical integration schemes. Further, we will concentrate on so-called finite-difference hydrodynamical simulators (FDHS), which solve three-phase fluid flow equations with gas liberation/dissolving from/into the oil options. Though, our approach can be straightforwardly generalised for other types of Base Models.

Thinking of FDHS as of an ideal flow model (in the sense of accuracy) we may want to speed up it by substituting it with a metamodel. To proceed, we need a formal definition of a reservoir metamodel (see also GTApprox2016 ):

Definition 1

Reservoir metamodel is a data-driven reservoir model aimed to approximate the transition function and production rates function represented by a Base Model.

Here and further we will assume the approximation quality to be measured in the sense of L2-norm. If we denote approximated transition and production functions by and respectively then the approximation problem may be formulated as follows:

where the sequence of approximated reservoir states is obtained from the approximated transition function , and the initial states are computed as follows:


In the following sections, we will discuss the training procedure of a data-driven reservoir metamodel starting from a description of the training set.

4 Training Data

In order to construct a metamodel we use supervised machine learning techniques, requiring a training set of examples of the Base Model dynamics. Training set is composed of scenarios (episodes), where each scenario represents one run of FDHS and contains:

  1. Input variables:

    • Scenario metadata in compressed vector form: (for more details see A). Metadata is fixed throughout the scenario.

    • Sequence of control variables , consists of production bottomhole pressure and water injection rate at time .

  2. Output variables (FDHS solutions):

    • Sequence of reservoir states . Each reservoir state

      is a tensor containing values of pore pressure, oil and water saturation for all cells of computational grid at time

      , where , , are numbers of cells in corresponding dimensions.

    • Sequence of oil, water and gas production rates , .

The output variables (2) were obtained from multiple FDHS runs on randomly generated input variables (1). The generative model of input variables is based on the analysis of real laboratory data and aimed to produce realistic samples (see B for details).

Each scenario has its own time horizon . FDHS turns off if the oil production rate falls below the limit of 3 stb/day, with an upper bound: years.

5 Latent Variable Reservoir Dynamics

Direct application of basic ML techniques to the approximation of functions and

is complicated by the so-called curse of dimensionality. Even for simplified Markovian version of the task (when

), the basic linear regression model will require learning of more then parameters, which will lead to the overfitting of the metamodel for any realistic number of available training examples — the metamodel will not be able to reproduce training accuracy on scenarios not included in the training set. Moreover, such a model will not be efficient both in terms of computational speed and memory consumption.

One way to reduce the problem complexity is to find a bijective mapping (dimensionality reduction) , where is a low-dimensional latent variable space, and to define the dynamics in this latent variable space:

Figure 2: Graphical representation of the latent variable reservoir dynamics workflow, cropped to one timestep.

The graphical representation of this latent variable reservoir dynamics model is shown in Figure 2.

Full-order reservoir states might be recovered as . It is not required to recover at each timestep – all the dynamics can be computed in the latent variable space with the need to use inverse mapping only for the timesteps of interest.

In the following subsections, we show some practical models for both dimensionality reduction function and latent dynamics function . Different combinations of these models will result in different reservoir metamodels.

5.1 Search for a mapping for the dimensionality reduction

From an ML point of view, dimensionality reduction problem may be solved by fitting the so-called autoencoding model. We will use autoencoders to represent the true bijective mapping and its inverse with parametrized approximations (encoder) and (decoder).

Autoencoder passes the input through a low-dimensional bottleneck via subsequent application of encoding and decoding functions, and is trained to minimise some deviation between autoencoder’s input and output:

Throughout this work, we analysed two distinct autoencoding models:

  • Principal Component Analysis,

  • Convolutional Variational Autoencoder.

Principal Components Analysis (PCA) is a linear dimensionality reduction model and is based on Singular Value Decomposition (SVD) of the matrix composed of all tensors flattened into vectors:

most significant singular values are retained to form truncated matrix of right singular vectors . is now can be used as an encoder and – as a decoder.

PCA provides the best approximation accuracy throughout all linear models with fixed latent space dimensionality in the sense of Frobenius norm .

However, PCA has two significant disadvantages: it is linear, and it ignores the spatial structure of the tensor due to the flattening procedure and fully-connected structure.

Convolutional Variational Autoencoder combines the application of Variational Autoencoding framework (VAE) vae-2013

with convolutional neural network modules.

Variational Autoencoder is a probabilistic model that tries to approximate not just a point estimate of

, but rather the posterior distribution over it with probability density function

. The task is solved by maximization of Evidence Lower Bound Objective (ELBO) proposed in the original paper.

The parametric form of the conditional distribution can be chosen to be Gaussian, and in this case, the ELBO maximization is closely related to the minimization of L2-norm of deviation in (5.1), so we stick with the Gaussian approach.

Due to the deterministic structure of latent dynamics models described below, we must eliminate the stochasticity of VAE. We use the following expressions for encoder and decoder respectively:

where is a likelihood function.

Convolutional neural networks colah-conv-2015 ; alexnet-2012 ; vgg-2014 are used to represent both likelihood function and posterior distribution . The use of convolutional layers allows us to overcome both linearity and indifference to the spatial structure of tensors . We use 3D convolutional layers with elementwise nonlinearity between them (Leaky Rectified Linear Unit – LReLU leakyrelu-2015

). Dimensionality reduction comes from the stride of the convolution window.

Figure 3:

Graphical representation of convolutional Variational Autoencoder. Decoder network is composed of convolutional layers with stride, and encoder is composed of convolutional and upscaling (trilinear interpolation) layers.

Convolutions exploit spatial invariance property that is also presented in FDHS engine: time derivatives of a property in a computational cell are dependent only on the parameters in the neighboring cells. Spatial invariance allows using less number of trainable parameters which helps us to overcome overfitting problem.

Graphical representation of convolutional VAE is provided at Figure 3.

5.2 Approximation of the latent dynamics

Given a set of latent variables we can now approximate the dynamics with using standard ML techniques.

Most of ML models assume the input variable of a fixed dimensionality, which restricts the direct applicability of these models to the approximation of non-Markovian dynamics since its inputs and are of variable length. For this kind of models we may allow the approximation to be Markovian instead: .

Different possible choices of the latent dynamics model are provided below. We denote the set of model’s parameters by and will fit them to minimize the average L2 loss in the latent space.

Linear Regression (LR) model requires Markovian simplification and follows the equation:

where parameters are obtained analytically.

Fully-Connected Neural Network (FCNN) is another Markovian model able to approximate a much broader class of functions than LR. FCNN is composed of consecutive fully-connected layers with outputs :

where is an elementwise nonlinear function. Parameters might be trained via one of gradient-based optimization procedures.

Recurrent Neural Network with Gated Recurrent Units (GRU RNN) gru-2014 compared to FCNN and LR models is a non-Markovian one. GRU RNN is a modification of a simple Recurrent Neural Network, and its internal structure is out of the scope of this work. For the sake of brevity, we provide here the equations for a simple RNN model. For :

where is a hidden layer vector, and are nonlinear functions; parameters are obtained from gradient-based optimization procedure.

5.3 Initial latent state approximation

Each of described above latent dynamics models requires an initial latent state to start the computations for a scenario.

From the equation (1) we know, that initial reservoir state is a function of just metadata vector : . Hence, may be computed as follows:

where is a known physics-based procedure aimed to calculate equilibristic initial reservoir state.

However, this approach requires explicit calculation of the high-dimensional tensor . Instead we may want to find the direct linear approximation for :

6 Calculation of Well Production Rates

Besides the calculations of the dynamics, the metamodel is required to approximate the production rates function .

To calculate the production rates from the well we use the physics-based model without trainable parameters. We exploit the fact that there are no additional sources or sinks in the development unit except for production and injection wells. Hence, the amount of oil, water, and gas produced might be calculated from the mass balance as the difference between subsequent reservoir states.

The volumetric amount of a fluid at time can be calculated from the corresponding saturation field . Amounts of fluid from each computational cell should be brought to the same pressure and then summed:

where pressure and saturation distributions , are taken from the approximated reservoir state , is a bulk volume of computational cell, is porosity, and are rock and fluid compressibilities, and the index denotes the number of a computational cell.

The reference pressure should be chosen from the interval where there is no mass transfer involved between the oil and the gas phases. For example, it might be selected to be equal to the initial pressure at the upper part of the reservoir.

Thus we can calculate the liquid’s production rate at the reference pressure:

where is the injected amount of fluid , which is equal to zero for all fluids except for water.

We can now obtain the production rates at surface conditions as follows:

where is a formation volume factor of the fluid .

To take into account the influence of gas, liberated from the oil, we use the following modification of (6):

where is the gas content.

7 Summary

The overall structure of the metamodel consists of two distinct parts: dynamics and production rates models. The dynamics model itself decomposes into the dimensionality reduction model and the latent dynamics model.

Input: Generative Model () for metadata and control,
number of training scenarios ,
Base Model ()
1 Generate for scenarios;
2 Compute ;
3 Form dataset ;
4 Train Autoencoder () on ;
5 Encode ;
6 ;
7 Train Latent Dynamics on ;
return ()
Algorithm 1 Generic training procedure for metamodels with the latent variable engine.

In the experimental section (Section 8) we show the superiority of the dynamics model composed of VAE for dimensionality reduction and GRU RNN for latent dynamics forecasting against other investigated choices for these components of the dynamics model.

Algorithm 1 shows the overall training procedure for the metamodel and is suitable for different choices of metamodel’s components.

8 Results

To analyse the performance of proposed metamodelling approach, we generated a dataset (using the generative model described in B) and divided it into training and validation subsets, containing more than 5000 and 100 scenarios respectively. The dimensionality of the computational grid is chosen to be and time discretization step: days.

The dataset contains scenarios of three types of possible development units, with either horizontal or vertical wells, different rock, and fluid properties, smooth or sharp applied control sequences .

3D ground truth solutions and production rates were obtained from Rock Flow Dynamics tNavigator simulator tnav used as the Base Model. We turned on the options of the simulator responsible for gas liberation and oil evaporation. “No flow” boundary conditions and equilibristic initial conditions were used throughout the simulations.

Following the Algorithm 1, we performed the training for two variations of the dimensionality reduction model: VAE and PCA, and three variations of latent dynamics model: LR, FCNN, GRU RNN. Two dimensionalities of the latent space were tested: and .

We used ADAM adam-2014

optimizer to perform gradient-based optimization when needed. All the Neural Networks were implemented using standard NN modules from the PyTorch library 

pytorch for Python programming language. We used LReLU nonlinearity for all of the NNs, except the GRU RNN module where we use nonlinearities proposed in the original work gru-2014

. Batch normalization 

batchnorm-2015 was used in all non-recurrent parts of NNs and layer normalization layrnorm-2016 in recurrent ones.

Mean relative error for

in % and its standard deviation

Dynamics Encoding Error % S.T.D.
Linear PCA 16.52 59.54 38.51 10.76 15.41 7.68
Linear VAE 12.97 26.69 14.68 8.54 15.00 7.67
Linear VAE 11.08 22.71 12.77 8.65 12.38 6.52
GRU PCA 14.17 52.27 33.92 5.61 9.85 4.22
GRU VAE 8.72 12.65 7.37 4.39 9.00 4.82
GRU VAE 9.51 14.74 8.63 6.54 8.51 4.19
Table 1: Mean relative error and its standard deviation between predicted and ground truth reservoir states for different metamodels. All the values are averaged throughout the validation dataset.
(a) pore pressure
(b) oil saturation
(c) water saturation
Figure 4: Mean values of spatial parameters calculated for a single scenario and averaged across all computational cells versus time: target and predicted by metamodels.
Figure 5: Horizontal slices of the oil saturation tensor at the medium layer at time years: target and predicted by metamodels.

Dynamics part of the metamodel was evaluated in the sence of relative deviation between metamodel’s forecast and the ground truth for all scenarios from the validation dataset . The results of the analysis are provided in the Table 1 for different choices of metamodel’s components. Bold values denotes the best obtained approximation accuracy.

The FCNN model is not provided in the table due to its inability to output long predictions. The training scheme for FCNN is aimed to minimize the error of one-step prediction. This leads FCNN to the extreme instability when FCNN gets its outputs as inputs. Interestingly, that this effect does not appear in the LR model, which is structured similarly to FCNN model, due to the simplicity of linear regression.

Visual analysis of the dynamics forecasting performance for the 3D flow problem was made using the plots of mean pressure and fluid saturations versus time (Figure 4) and with the horizontal slices of pressure and saturation tensors at a fixed moment of time (Figure 5). Figures 4 and 5 show the results for the randomly chosen validation scenario with sharp applied control, staggered line drive allocation scheme, and vertical wells.

(a) allocation scheme
(b) applied control
(c) well orientation
Figure 6: The distribution of production rates prediction error between different groups of validation scenarios. VAE + GRU RNN, metamodel was used to make predictions.
(a) water production rate
(b) oil production rate
(c) gas production rate
Figure 7: Ground truth and predicted production rates for the randomly chosen scenario. VAE + GRU RNN, metamodel was used to make predictions.

The accuracy of the procedure for computing production rates was tested on the best dynamics metamodel (VAE + GRU RNN, ) since it is strongly dependent on the dynamics accuracy. Mean relative deviation between predicted and ground truth rates for this model is provided on the Figure 6 where different groups of scenarios were analyzed separately. Visual check of the performance can be made on the plots of predicted daily production rates for randomly chosen validation scenario (Figure 7) – here we used the same scenario that was used in the dynamics visualization.

On average, the total on-wall time consumed by tNavigator to produce solutions for 30 years of simulation is 120 seconds. Computations were divided between 30 CPU cores. On the other hand, the metamodel consumes 10 seconds for the same simulation, if the GPU is used. Even in the case when the CPU is used, it consumes only 70 seconds. Though, convolutional layers are not adapted to be used with CPU. So, the metamodel is about x12 faster than the Base Model.

9 Discussion

We have developed the novel approach for prediction of reservoir performance enabling a new way of assessing the possible reservoir development scenario. Being much faster than a conventional reservoir modeling tool, the metamodel allows overlooking many more scenarios over a given time interval.

The approach is a pioneering one regarding its ability to output not only the flow rates at the wellheads, but also the actual dynamics of 3D fields of pressure, and fluid saturations over the computational cells of a conventional hydrodynamic model of a reservoir.

The approach itself is also somewhat unique because of its high-level structure containing methods of reduced order modelling (ROM) and the data-driven forecasting (DDF) tools. This high-level structure presents a very general framework for further enhancements as both ROM and the DDF methods are being developed these days very rapidly. We have tried just several ROM and DDF approaches and defined the ones demonstrating the best results.

There is no doubt that both building blocks of our workflow will progress in terms of accuracy and computational efficiency in training and especially prediction modes.

10 Further Work

We plan to continue our research in three principal directions. First is further experimenting with DDF and ROM architectures to increase the accuracy of the forecast and ensure the higher computational efficiency, as well as to try to combine data, generated by FDHS with different fidelities, when training a metamodel (see also MFGP2015 ; MFGP2017 ). Second is about going a scale up and try to adapt the metamodels for the full-scale reservoirs and oilfields containing many development units. Later will lead to more useful tools for applied reservoir engineering, including field development optimization and history matching. The third is the expansion of the metamodel’s operational envelope to the more complex fluid systems and their PVT representations and the low permeable reservoirs. The low permeable direction is likely to be supported by a training set generated with an approach presented in bezyan2019novel .


Appendix A Scenario metadata in compressed vector form

Variable , defined in Section 3 and mentioned as metadata of the development unit, represents physical properties of the rock and fluids, and geometrical characteristics of the development unit and corresponding wells.

We use the metadata as an input of ML models, and most of them require inputs in a vectorised form with fixed dimensionality. Moreover, this dimensionality should be small to prevent the overfitting.

Some of the physical properties can be represented as scalars and used as is, being stacked into a vector. These are rock and water compressibilities, fluids densities, water viscosity, development unit size in each of three dimensions, initial water-oil and gas-oil contact depths, the depth of the top of the reservoir, initial pore pressure, saturation pressure and gas content.

Categorical properties, namely well allocation scheme and spatial orientation of wells, can be presented via one-hot encoding. The exact position of either vertical or horizontal well can be described by two values: the depth of the beginning of the perforation interval, the length of the perforation interval. We assume wells to have just one perforation interval.

The complicated part is to represent properties defined for each computational cell (porosity, absolute permeability) and table-defined functions (PVT tables, tables of relative permeabilities).

We exploit the restrictions imposed on the development unit to compress large fields such as porosity and permeability fields. We assume homogeneous rock in the sense of porosity, and layerwise-nonhomogeneous rock in the sense of permeability. So the porosity can be presented as a scalar, and permeability field forms the vector of length with values for each layer of the reservoir.

The problem with table-defined functions is that they have a non-fixed number of points in their tables. The problem can be resolved by interpolating these functions onto a predefined domain. For example, any table-defined relative phase permeability function can be represented using the set of water saturation values starting from to with small step size.

Such representation fixes the dimensionality but is rather high-dimensional. We compress the dimensionality using the PCA method. Projection coefficients are obtained as follows:

  • Interpolate table-defined functions onto a fixed domain,

  • Flatten the tables of function values into a single vector,

  • Collect a large matrix of flattened tables obtained randomly from the generative model (B) and apply PCA.

Actually, the restrictions on the homogeneity of the development unit might be overcome using the same dimensionality reduction idea (possible with the neural network instead of PCA). Though, this approach requires construction of the generative model for the whole heterogeneous permeability field, which is a complicated task by itself.

Appendix B Generative model of realistic metadata and control variables

Generation of synthetic but realistic oil-fields is a very complex task by itself. We restricted ourselves on the generation of just pieces of an oil-field (development units) with the layered structure.

But still, the detailed description of the process is too large to be shown in the paper. The generative model should provide us with the randomised samples of the reservoir geometry, porosity and absolute permeability fields, initial conditions, PVT tables for oil and gas phases, tables of relative permeabilities and capillary pressures, exact positions of the wells, water and rock properties and so on.

Here we present the overall ideas used to obtain the generative model:

  • Collect a large set of values of the interest from a variety of real oil-fields.

  • Divide all the properties into interindependent groups.

  • Visualize joint empirical distributions for each group. Assume the parametric form of a distribution able to approximate empirical one efficiently.

  • For each group, fit parameters of chosen parametric distribution to match the empirical data using the max-likelihood method. We used following parametric families of distributions: Gaussian, log-Gaussian, Uniform.

Figure 8: Joint distribution over log-permeability and porosity values. Plot shows points from the dataset of the real values (blue) and points drawed from the fitted Gaussian distribution (black).

For example, we grouped values of porosity and absolute permeability and modelled them jointly. The form of the empirical distribution is similar to two dimensional Gaussian if the permeabilities are logarithmized. So we fitted the mean and covariance matrix of multivariate Gaussian distribution to match the pair: porosity and log-permeability. Samples from the resulting distribution are shown in Figure 8 together with the points from the dataset.

Table-defined functions are generated from the point estimates paired with standard correlation functions, such as Corey, Al-Marhoun, Khan et al., Vasquez and Beggs correlations [27].

(a) exponentional
(b) sharp
Figure 9: Samples of two types of applied control variables: exponentional decay and sharp changes. Time-series of bottomhole pressure for injection and production wells.

For the time-series (bottomhole pressure for injection and production wells) we used three possible forms, see Fig. 9:

  • Constant value,

  • Exponentional decay,

  • Sharp and frequent. changes

The form is sampled randomly, and actual values are obtained from a simple randomised procedure based on the Uniform distribution.