1 Introduction and Justification
“Digital Rock Physics” is an innovative approach for computing the properties of rocks. The paradigm of Digital Rock Physics is “imageandcompute”: the rock sample is imaged to obtain a 3D representation of the mineral phase and pore space, and this 3D representation is then used to simulate physical processes in the sample. chauhan2016processing ; koroteev2017method ; andra2013digital_1 ; blunt2013pore .
Recent methods of 3D imaging of pore topology include microscale xray computed tomography (µxCT), which images rock samples with resolution down to tens of nanometers (voxel size) or hundreds of nanometers (physical resolution). µxCT enables the internal structure of finestructured samples to be imaged accurately and nondestructively. After removal of µxCT scanning artifacts and segmentation chauhan2016processing ; koroteev2017method , the scan is processed to retain the sample’s petrophysical properties.
Applications of Digital Rock technologies include:

the calculation of transport properties such as absolute permeability and relative permeability andra2013digital_2 ; koroteev2014direct ; berg2017industrial ; blunt2013pore ;

the calculation of electric, elastic, geomechanical properties and NMR response andra2013digital_2 ; blunt2013pore ; evseev2015coupling ;

screening enhanced oil recovery methods koroteev2013application ;

assessing the potential efficiency of chemical treatment for well stimulation klemin2015digital .
Recent advances in highresolution imaging, high performance computing and Machine Learning will lead to new and more effective computation.
The present work applies advances in Deep Learning image processing to Digital Rock Physics. Our goal is to build fast approximation models, or socalled surrogate models, to predict permeability based on the results of physical modeling (an example of such modeling can be found in belyaev2016gtapprox ). Such models are an acknowledged method for solving various industrial engineering problems grihon2013surrogate .
Using the VGG16 DNN simonyan2014very network, we recover a set of descriptors for the 2D layers, which compose the 3D image, and utilize their lowdimensional representation to compute sample permeability. The results outperform the frequently used technique of using Minkowski functionals as input features for a machine learning algorithm to predict logarithmic permeability.
We also assess the applicability of conventional Deep Learning models — convolutional neural networks (CNNs), which are frequently used in the analysis of multidimensional data — to µxCT voxel rock scans. We apply CNNs in an endtoend fashion to simultaneously extract features and carry out regression for permeability prediction. The advantage of this approach is that it does not require manual feature engineering, but provides equally accurate results.
2 Data Acquisition
We used a sample from the Berea Sandstone Petroleum Cores (Ohio, USA) for model evaluation. The 3D image already had its artifacts removed and its segmentation computed by Imperial College London. The segmented sample makes no distinction between different rock phases, denoting every rock voxel as 0, and every pore voxel as 1. The initial sample consisted of elements with voxel size of 5.345 µm.
To generate a dataset for machine learning algorithms, the sample was cut into intersecting voxel cubes with shift of 15 voxels and same step size of 5.345 µm, giving a dataset of 9261 samples in total. Each of these smaller samples can be examined as an independent rock image, retaining some geometrical properties of the parent Berea sample.
To compute initial permeability values for voxel cubes in the dataset, we used a porescale network modelling code (courtesy of Taha Sochi) sochi2010pore , paired with OpenPNM framework putz2013introducing . The network model is a simplified representation of rock geometry, consisting of spherical pores connected by cylindrical throats, usually stored in Statoil format. The network representation was then used to compute the rock permeability of each cut rock sample, making use of Darcy’s law and taking the flow type to be Stokes flow. The result was a dataset of 9261 voxel cubes and corresponding permeability values, measured in millidarcies. The byproduct of the calculations is a set of 9261 network models.
3 Regression on Generated Features
3.1 Feature generation
We considered three different approaches to feature generation for regression. First, we tried to explicitly use characteristics of network models. Second, we computed a wellknown set of geometrical descriptors, Minkowski Functionals, for use as an input for the predictor. Finally, we used a set of 2D image descriptors with reduced dimensionality, acquired using VGG16 DNN and Principal Component Analysis (PCA), as a feature set.
Network characteristics
We considered a number of network model characteristics, which could influence the permeability value for a given sample. These characteristics are: median pore radius, mean pore radius, median throat radius, mean throat radius, median throat length, mean throat length, median pore connectivity number, mean pore connectivity number, and total pore count. As in Stokes flow, we considered advective inertial forces to be small, compared to viscous forces. The permeability of the sample is then proportional to the area of the phase transition surface, which, in turn, is proportional to certain included characteristics. However, this approach proved inferior to the other two approaches.
Minkowski functionals
Minkowski functionals (also known as intrinsic volumes or quermass integrals) are additive morphological measures, initially defined for convex objects in the field of integral geometry. It has been shown that every measure on the finite union of compact convex sets in can be expressed as a linear combination of the four Minkowski functionals. For these Minkowski functionals are volume, area, mean breadth and the EulerPoincaré characteristic. In recent years, these functionals have found applications in astronomy, material science, medicine and biology blasquez2003efficient ; guderlei2007algorithms ; berchtold2007modelling ; vogel2010quantification , as well as voxelbased surface recognition yarotsky2017geometric .
The additive property of Minkowski functional for two convex sets A and B can be expressed as .This property, as well as discrete structure of the 3D voxel image, considerably simplifies the computation of Minkowski functionals for a rock sample dataset, since, for such objects, the procedure is reduced to enumeration of open voxels, faces, edges and vertices blasquez2003efficient .
For efficient computation of the functionals, we utilized the method, described in blasquez2003efficient , which uses binary decision diagrams. This method takes advantage of the local configuration around each added voxel.
Minkowski functionals for rescaled samples
Another way to evaluate rock permeability using its voxel image is to include not only Minkowski functionals for the sample itself, but also functionals for a rescaled sample as an input to the machine learning algorithm srisutthiyakorn2016deep . The intuition behind this technique is that, while Minkowski functionals retain some fine information about geometrical structure of the sample, calculating them for a rescaled sample could provide insights on the geometrical structure on a larger scale, offering a better mathematical description of the sample and its viscous flow properties.
A rescaled sample of magnitude is a voxel cube, the dimensions of which are times smaller. A given voxel is set to 1 (pore phase) if the average of voxels in the corresponding range in the initial sample is no less than a specified threshold. In this work, a threshold of 0.5, and magnitudes of 2, 5, 10 and 25 were used.
VGGPCA descriptors
VGG is the name for a family of deep convolutional neural networks (DCNNs) for 2D image recognition. They were introduced on ImageNet Challenge 2014
simonyan2014very , and they marked the advent of the Deep Learning era. Not only could these networks provide accurate image classification and generalize well, but a pretrained network could be finetuned to a given problem, quickly achieving satisfactory performance in terms of some metric without additional increase of the dataset size and training time.A common approach to finetuning is to remove several bottom layers from a pretrained DNN or CNN, exposing one of the dense layers, which are typically denoted as FC1000 or FC4096 (see the VGG architecture in simonyan2014very ). A number of layers is then added to the network, as appropriate to the specifics of the problem, and they are trained on the new data.
In our approach, we simply extract features from the FC4096 layer for each 2D slice of the scan, represented as an image. After additional processing with PCA to reduce their dimensionality, these features are then used as inputs for the regression.
One important property of VGG network dense layers is that they retain a significant amount of image structure, and their output alone is frequently enough to correctly classify a given image or to process it in some other way. Although they are not interpretable, this set of values can provide much insight into composition, pattern distribution and other aspects of the image.
For the purposes of rock permeability prediction, we recovered descriptors of 2D layers, or, essentially, voxel rectangles, which compose a given sample in the dataset. To process the binary image, we converted all rock voxels to vectors in RGB code, and pore voxels to . Accordingly, the voxel layers had to be resized to . The output of the second fully connected layer of size 4096 was used to recover the descriptors.
The feature vector of length 409600, obtained by concatenation of 100 layer descriptors, can then be interpreted as a descriptor of the sample as a whole, retaining information about the structures of individual layers and their position in the voxel cube through the placement of individual layers.
The enormous size of the vector makes it unsuitable for direct use as an input for the regression model. Instead, we used PCA to reduce dimensionality of the input vectors, making it possible to use conventional models without significant modifications. A principal component size of 1350 was used.
3.2 Regression methods
We used two regression methods to evaluate the predictive power of generated features: gradient regression trees (XgBoost) and deep neural networks (DNNs)
XgBoost
XgBoost is a gradient boosting library xgboost . It provides a powerful prediction model, consisting of numerous weak prediction models chen2016xgboost
. It is much used in Machine Learning due to its computation speed and interpretability of the results. We first found model hyperparameters, which yielded better results in terms of the
metric (this metric is described in detail in section 5), by a grid search, i.e., by training the model with different hyperparameters and evaluating its performance on the holdout validation subset.The parameters used for all feature groups are described in Table 1.
Deep neural networks
We used several deep multilayer perceptron (MLP) architectures to assess predictive power of generated features with neural networks. Final architecture and results are described in section 5.
learning_rate=0.05 
n_estimators=400 
max_depth=5 

min_child_weight=6 
gamma=0.1 
subsample=1 
colsample_bytree=1 
reg_alpha=0.5 
reg_lambda=1 
4 EndtoEnd Regression
We assessed the use of an endtoend convolutional neural networks (CNN) modeling technique, which is commonly applied in image processing. We used a 2D CNN to carry out regression on individual 2D slices of a sample and a 3D CNN to process the samples as a whole.
2D convolutional neural networks
CNNs are a class of deep feedforward artificial neural networks, which are most commonly applied to analyze visual imagery.
Just like multilayer perceptrons, CNNs were inspired by biological structures, specifically, by the organization of the visual cortex of animals. Specific cortical neurons respond to corresponding stimuli only in a restricted region of the visual field, which is called the “receptive field” of those neurons
matsugu2003subject .In artificial neural networks, this approach is realized by convolutional and pooling layers. A convolutional layer consists of a number of filters, each of which is trained to respond to a specific pattern. Filters are iterated over the input tensor, and the Hadamard product of filter weights and corresponding input values is calculated for each visited position. The sum of the elements in the resulting matrix is then passed on. The subset of input values, analyzed by a filter at a given step, is called the receptive field of the filter.
This generalization of a biological approach offers a number of advantages, such as shiftinvariance and parameter sharing lecun2015deep
. But the most important point in the context of the task at hand is the emphasis given to the spatial structure of the data. As porosity depends greatly on the number and shapes of pores, the ability to treat input voxels which are close to each other differently from voxels which are far away is invaluable for estimating both local and global spatial structure of the rock sample.
CNNs are often used to analyze images, most commonly represented as 3D tensors, where the first two dimensions correspond to directions in the image, and the third dimension corresponds to color channels (red, green, blue) for a given pixel. In our work, we used 2D convolutions with 3D rock samples the same way as for images. But instead of vectors of color channels, we used vectors of voxel values on other layers: the three color channels are replaced by 100 values, each corresponding to a layer of the sample. After a series of other convolutions, maxpooling operations and fully connected layers, the predicted value for sample permeability is calculated.
However, this approach has the significant disadvantage that each filter works with tensors, disregarding information from neighboring layers. This nuance is mitigated for 3D CNNs, where each filter examines the local region across all three dimensions of a 3D rock sample.
3D convolutional neural networks
3D CNNs are based on the same idea as 2D CNNs, but 3D filters are used for convolutional layers. For purposes of permeability prediction, this allows a given filter to receive the local information for a given voxel not just from the same layer, but also from neighboring layers. As rock pores are threedimensional, such an approach provides more practical information to each network unit.
5 Model Evaluation and Results
We compared the predictive power of feature sets and we also compared different prediction methods. The selection of methods used the criteria of interpretability and relatively straightforward modus operandi.
All prediction methods were compared with each other. For better interpretability of results, we used a special metric, denoted as
Here, denotes the true permeability value for sample , is a predicted permeability value for sample , and is the th percentile of a true permeability histogram for a given cube. This metric is more informative than mean squared error. Conventional error does permit comparison of algorithmic approaches with each other, but provides zero information about how large the error is compared with variability of the data. Our approach takes account of such difference.
XgBoost
The results for selected feature types and feature type combinations using the XgBoost approach are presented below. The last row of each table corresponds to the feature group combination, which yields the best result for the given method. Only error values below the 90th percentile were used to produce the charts, as each method generates strong outliers in terms of permeability.
Here and below, VGGPCA corresponds to introduced VGGPCA descriptors, NET corresponds to rock sample network features, and MX corresponds to Minkowski functionals for an Xrescaled cube.
Feature Groups Used  Validation 

VGGPCA  0.0451 
NET  0.0417 
M1  0.0421 
M1 + M2 + M5 + M10 + M25  0.0406 
M2 + M5 + M10 + NET  0.0368 
To further improve the results, training and prediction were carried out with logarithms of permeability, and ABSq was computed for the exponent of prediction. This empirical technique has proved to give better results in some cases.
Feature Groups Used  Validation 

VGGPCA  0.0367 
NET  0.0372 
M1  0.0391 
M1 + M2 + M5 + M10 + M25  0.0370 
VGGPCA + M1 + M5 + M25 + NET  0.0338 
The VGGPCA features perform much better when used with logarithmic permeability. Despite not being able to provide the best results individually, we found that the VGGPCA features are among the 25 topscoring feature type combinations. It is interesting that the result using VGGPCA features is much worse for usual permeability. However, examining the cause for that would require an excursion into VGG16 architecture specifics, which goes beyond the scope of the present article. These results should be regarded with a degree of caution, as, strictly speaking, they only correspond to the predictive power of these feature groups for a given sandstone sample, and only for the XgBoost model.
Deep neural networks
Dense (units=2048, activation=”relu”) 

Dense (units=2048, activation=”relu”) 
Dense (units=1024, activation=”relu”) 
Dense (units=512, activation=”relu”) 
Dense (units=256, activation=”relu”) 
Dense (units=1, activation=None) 
To evaluate the predictive power of feature groups paired with neural networks, we used three feature group combinations: VGGPCA, M1 + M2 + M5 + M10 + M25 and VGGPCA + M1 + M2 + M5 + M10 + M25. The number of considered feature group combinations was restricted in order to reduce time spent on training the models, since neural networks are much more computationally demanding.
We evaluated several straightforward multilayer perceptron (MLP) architectures for each feature group combination, and the best one was selected for comparison. It is presented in Table 4.
Feature Groups Used  Validation 

VGGPCA  0.0287 
M1 + M2 + M5 + M10 + M25  0.0441 
VGGPCA + M1 + M2 + M5 + M10 + M25  0.0384 
The only difference between the best architectures is that Minkowski functionals seem to provide better results when a batch normalization layer is added before the output unit. In the following table we present
value for all considered feature group combinations.The addition of VGGPCA descriptors to Minkowski functionals reduces prediction error. However, the best result is achieved when they are used separately from the other features. This is because the network was not given enough training time to nullify excess information coming from the Minkowski functionals, which introduced additional error.
The number of training epochs was limited to 50 for all tested architectures, batch size was set to 8, and the Adam optimizer with a learning rate of 0.001 was used. All remaining hyperparameters were set to default. Early stopping was used in order to determine the best validation score.
Convolutional neural networks
Below we present the best performing 2D and 3D CNN architectures. Similarly to other approaches, logarithmic permeability was used as an output value.
Used 2D CNN architecture is inspired by the VGG16 network lecun2015deep , which was used to compute VGGPCA descriptors. We consider each sample to have 100 channels, each corresponding to an individual layer. The model was trained using the Adam optimizer with default parameters, for 20 epochs and a batch size of 32.
2D Convolutional (filters=64, kernel_size=3, activation=”relu”, padding=”same”) 

2D Convolutional (filters=64, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=64, kernel_size=3, activation=“relu”, padding=“same”) 
Max Pooling (pool_size=(2, 2), strides=(2, 2)) 
2D Convolutional (filters=128, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=128, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=128, kernel_size=3, activation=“relu”, padding=“same”) 
Max Pooling (pool_size=(2, 2), strides=(2, 2)) 
2D Convolutional (filters=256, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=256, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=256, kernel_size=3, activation=“relu”, padding=“same”) 
Max Pooling (pool_size=(2, 2), strides=(2, 2)) 
2D Convolutional (filters=512, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=512, kernel_size=3, activation=“relu”, padding=“same”) 
2D Convolutional (filters=512, kernel_size=3, activation=“relu”, padding=“same”) 
Max Pooling (pool_size=(2, 2), strides=(2, 2)) 
Dense (1024, activation=“relu”) 
Dropout(0.5) 
Dense (512, activation=“relu”) 
Dropout(0.5) 
Dense (1, activation=None) 
Used 3D CNN architecture was inspired by VoxNet maturana2015voxnet , which was initially used for object recognition. Compared with 2D convolutions, the application of 3D CNN is straightforward. The model was also trained using the Adam optimizer with default parameters for 20 epochs and a batch size of 32. Valid padding was used for all convolutional layers.
3D CNNs of similar structure have proven to be an efficient way of addressing the task of 3D shape retrieval notchenko2017large , since they can learn efficient descriptors for 3D objects, which are bound to be effective for regression.
3D Convolutional (filters=32, kernel_size=5, strides=2, activation=”relu”) 

3D Convolutional (filters=32, kernel_size=5, strides=2, activation=”relu”) 
Max Pooling 3D (pool_size=(2, 2), strides=(1, 1)) 
3D Convolutional (filters=32, kernel_size=3, strides=1, activation=”relu”) 
3D Convolutional (filters=32, kernel_size=3, strides=1, activation=”relu”) 
Max Pooling 3D (pool_size=(2, 2), strides=(1, 1)) 
Dense (128, activation=“relu”) 
Dense (64, activation=“relu”) 
Dense (1, activation=None) 
Below we present the boxplot of errors for neural network approaches for permeability prediction. Once, again, strong outliers lying above the 90th percentile of errors were not used.
Overall evaluation of the methods
Below we present a comparison of the best results for each considered method. Overall, 3D convolutional neural networks proved to be superior both in terms of the metric and of error distribution. We would therefore recommend focusing on 3D convolutional neural networks for the development of datadriven permeability prediction models.
Model and Approach  

M2+M5+M10+NET XGB  0.0368 
VGGPCA+M1+M5+M25+NET LOG XGB  0.0338 
VGGPCA MLP  0.0287 
3D CNN  0.0284 
6 Conclusions and Discussion
The results of this pilot study clearly demonstrate the significant potential of machine learning for imagebased permeability prediction. It can be seen that the datadriven approach is a true game changer for Digital Rock technology because it is extremely fast and scalable. Moreover, the approach appears to be applicable not only to singlephase permeability prediction, but to prediction of more complex properties relevant to petrophysics, structural geology and field development. Such properties may include relative phase permeabilities, formation factor and resistivity, dielectric permittivity, elastic and geomechanical properties, NMR response and others. There are opportunities for enriching input data with information on mineral distribution, wettability and intergrain contacts to obtain the highest possible predictive power. Clearly, there is no shortage of themes for future study and it should also be noted that the recent developments in Deep Learning are likely to enable prediction of the dynamics of fluid displacement, and not merely of the static characteristics of digitized rock samples of a single type. Assisted by a feature set containing information about pore fluids, this will represent a promising direction for applications of imagebased Digital Rock in enhanced oil recovery work. We would emphasize that this direction will need to be developed together with physicsdriven pore scale modeling koroteev2014direct , since, without the physicsbased models, there will be no actual data, on which to carry out training.
In future studies we plan to evaluate the use of more efficient DNN models and methods. These include modern approaches to constructing ensembles of regression models burnaev2013method and special methods for the initialization of DNN parameters burnaev2016influence .
We are also considering the adaptation of implemented models to other core types using multifidelity regression modeling methods. These are examined in zaytsev2017minimax ; zaytsev2017large ; burnaev2015surrogate . Successful applications of such a technique include an application in aerodynamics, examined in belyaev2014building
Another approach worth considering is to apply adaptive design of experiments, devised for industrial engineering problems, to both increase the efficiency of sensitivity analysis and improve utilization of the computational budget when generating a training sample burnaev2017efficient ; burnaev2015adaptive .
References
References
 (1) S. Chauhan, W. Rühaak, F. Khan, F. Enzmann, P. Mielke, M. Kersten, I. Sass, Processing of rock core microtomography images: Using seven different machine learning algorithms, Computers & Geosciences 86 (2016) 120–128.
 (2) D. A. Koroteev, A. N. Nadeev, I. V. Yakimchuk, I. A. Varfolomeev, Method for building a 3d model of a rock sample, uS Patent 9,558,588 (Jan. 31 2017).
 (3) H. Andrä, N. Combaret, J. Dvorkin, E. Glatt, J. Han, M. Kabel, Y. Keehm, F. Krzikalla, M. Lee, C. Madonna, et al., Digital rock physics benchmarks—part i: Imaging and segmentation, Computers & Geosciences 50 (2013) 25–32.
 (4) M. J. Blunt, B. Bijeljic, H. Dong, O. Gharbi, S. Iglauer, P. Mostaghimi, A. Paluszny, C. Pentland, Porescale imaging and modelling, Advances in Water Resources 51 (2013) 197–216.
 (5) H. Andrä, N. Combaret, J. Dvorkin, E. Glatt, J. Han, M. Kabel, Y. Keehm, F. Krzikalla, M. Lee, C. Madonna, et al., Digital rock physics benchmarks—part ii: Computing effective properties, Computers & Geosciences 50 (2013) 33–43.
 (6) D. Koroteev, O. Dinariev, N. Evseev, D. Klemin, A. Nadeev, S. Safonov, O. Gurpinar, S. Berg, C. van Kruijsdijk, R. Armstrong, et al., Direct hydrodynamic simulation of multiphase flow in porous rock, Petrophysics 55 (04) (2014) 294–303.
 (7) C. F. Berg, O. Lopez, H. Berland, Industrial applications of digital rock technology, Journal of Petroleum Science and Engineering 157 (2017) 131–147.
 (8) N. Evseev, O. Dinariev, M. Hurlimann, S. Safonov, et al., Coupling multiphase hydrodynamic and nmr porescale modeling for advanced characterization of saturated rocks, Petrophysics 56 (01) (2015) 32–44.
 (9) D. A. Koroteev, O. Dinariev, N. Evseev, D. V. Klemin, S. Safonov, O. M. Gurpinar, S. Berg, C. vanKruijsdijk, M. Myers, L. A. Hathon, et al., Application of digital rock technology for chemical eor screening, in: SPE Enhanced Oil Recovery Conference, Society of Petroleum Engineers, 2013.
 (10) D. Klemin, A. Nadeev, M. Ziauddin, et al., Digital rock technology for quantitative prediction of acid stimulation efficiency in carbonates, in: SPE Annual Technical Conference and Exhibition, Society of Petroleum Engineers, 2015.
 (11) M. Belyaev, E. Burnaev, E. Kapushev, M. Panov, P. Prikhodko, D. Vetrov, D. Yarotsky, Gtapprox: Surrogate modeling for industrial design, Advances in Engineering Software 102 (2016) 29–39.
 (12) S. Grihon, E. Burnaev, M. Belyaev, P. Prikhodko, Surrogate modeling of stability constraints for optimization of composite structures, in: SurrogateBased Modeling and Optimization, Springer, 2013, pp. 359–391.
 (13) K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556.
 (14) T. Sochi, Porescale modeling of nonnewtonian flow in porous media, arXiv preprint arXiv:1011.0760.
 (15) A. Putz, J. Hinebaugh, M. Aghighi, H. Day, A. Bazylak, J. T. Gostick, Introducing openpnm: an open source pore network modeling software package, ECS Transactions 58 (1) (2013) 79–86.
 (16) I. Blasquez, J.F. Poiraudeau, Efficient processing of minkowski functionals on a 3d binary image using binary decision diagrams.
 (17) R. Guderlei, S. Klenk, J. Mayer, V. Schmidt, E. Spodarev, Algorithms for the computation of the minkowski functionals of deterministic and random polyconvex sets, Image and Vision Computing 25 (4) (2007) 464–474.
 (18) M. Berchtold, Modelling of random porous media using minkowskifunctionals, Ph.D. thesis (2007).
 (19) H.J. Vogel, U. Weller, S. Schlüter, Quantification of soil structure based on minkowski functions, Computers & Geosciences 36 (10) (2010) 1236–1245.
 (20) D. Yarotsky, Geometric features for voxelbased surface recognition, arXiv preprint arXiv:1701.04249.
 (21) N. Srisutthiyakorn*, Deeplearning methods for predicting permeability from 2d/3d binarysegmented images, in: SEG Technical Program Expanded Abstracts 2016, Society of Exploration Geophysicists, 2016, pp. 3042–3046.
 (22) dmlc/xgboost: Scalable, portable and distributed gradient boosting (gbdt, gbrt or gbm) library, https://github.com/dmlc/xgboost.
 (23) T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, ACM, 2016, pp. 785–794.

(24)
M. Matsugu, K. Mori, Y. Mitari, Y. Kaneda, Subject independent facial expression recognition with robust face detection using a convolutional neural network, Neural Networks 16 (56) (2003) 555–559.
 (25) Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444.
 (26) D. Maturana, S. Scherer, Voxnet: A 3d convolutional neural network for realtime object recognition, in: Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, IEEE, 2015, pp. 922–928.
 (27) A. Notchenko, Y. Kapushev, E. Burnaev, Largescale shape retrieval with sparse 3d convolutional neural networks, in: International Conference on Analysis of Images, Social Networks and Texts, Springer, 2017, pp. 245–254.
 (28) E. Burnaev, P. Prikhodko, On a method for constructing ensembles of regression models, Automation and Remote Control 74 (10) (2013) 1630–1644.
 (29) E. Burnaev, P. Erofeev, The influence of parameter initialization on the training time and accuracy of a nonlinear regression model, Journal of Communications Technology and Electronics 61 (6) (2016) 646–660.

(30)
A. Zaytsev, E. Burnaev, Minimax approach to variable fidelity data interpolation, in: Artificial Intelligence and Statistics, 2017, pp. 652–661.
 (31) A. Zaytsev, E. Burnaev, Large scale variable fidelity surrogate modeling, Annals of Mathematics and Artificial Intelligence 81 (12) (2017) 167–186.
 (32) E. Burnaev, A. Zaytsev, Surrogate modeling of multifidelity data for large samples, Journal of Communications Technology and Electronics 60 (12) (2015) 1348–1355.
 (33) M. Belyaev, E. Burnaev, E. Kapushev, S. Alestra, M. Dormieux, A. Cavailles, D. Chaillot, E. Ferreira, Building data fusion surrogate models for spacecraft aerodynamic problems with incomplete factorial design of experiments, in: Advanced Materials Research, Vol. 1016, Trans Tech Publ, 2014, pp. 405–412.
 (34) E. Burnaev, I. Panin, B. Sudret, Efficient design of experiments for sensitivity analysis based on polynomial chaos expansions, Annals of Mathematics and Artificial Intelligence 81 (12) (2017) 187–207.

(35)
E. Burnaev, M. Panov, Adaptive design of experiments based on gaussian processes, in: International Symposium on Statistical Learning and Data Sciences, Springer, 2015, pp. 116–125.