1 Introduction and Motivation
This research is inspired by a practical issue of companies operating oilfields. The issue is related to a need for an adequate forecasting of the efficiency of hydraulic fracturing jobs citation01 ; citation02 ; citation03 ; citation04
. An accurate prediction in terms of extra oil production allows performing reliable estimation of efficiency of investment in HF programs. Planning of HF programs typically includes two major parts. First is a selection of candidate wells for performing the HF jobs, and second is detailed planning of the jobs for each selected wellbore or a set of wellbores. The recent practice also includes planning the HF for newly drilled directional wells. Basically it is quite common when multistage HF is considered as an essential part of well completion. This paper covers a novel approach for the second part of HF program planning. It is well known, that uncertainty within a geological model of a hydrocarbon reservoir is a key source of risks at decision making for all the levels of field development planning workflows. Planning of a particular well stimulation job is not an exemption
citation06 . This operation is typically based on a combination of physicsdriven modelling of geomechanics and hydrodynamics of fracturing citation07 ; citation08 and further reservoir modeling with updated transport properties of near wellbore computational cells. The update of transport properties is typically driven by an experiencebased workflows formalized in an internal document of operating companies. There are engineering approaches combining the reservoir modeling and fracture modeling (see citation09 for example) and the approaches already including machine learning for reservoir modeling citation10 .A real example of utilization of the workflow described in the paragraph above is shown in Figure 1. The scatter captures 270 wells from one of the West Siberian oilfields fractured in a period of 2013 to 2016. The scatter contains wells with production history. One can immediately see that correlation between forecast and reality is very weak and errors in the forecast result in sufficient inaccuracy in forecast of financial efficiency of the well stimulation program. Also, one can see that predictions bellow 5 tons are thrown away despite indeed there are many wells which have actual average production rate below 5 tons. Moreover, one can spot a dense field of data points in a vicinity of 17 to 18 tons/day in the predicted values. This concentration reflects some bias of the experiencebased forecasting which is likely related to a certain threshold of planned marginal profit out of the fractional job.
In many cases machine learning allow handling the forecasting problems for cases when accuracy of physicsdriven or empirical models is limited by uncertainties of their input parameters and can provide fast approximation, or the socalled surrogate models, to estimate selected properties based on the results of real measurements (for details, see (GTApprox2016, )). Surrogate models are a wellknown way of solving various industrial engineering problems including oil industry (grihon2013surrogate, ; SMspacecraft2016, ; Grihon2014, ; GrihonFactorial2014, ; drilling2019, ; oil2019, ; core2019, ).
This particular study shows the potential of the machine learning in a case when we do not know the exact values characterizing the geological and physical state of the formation targeted for a hydraulic fracturing but have a history of fracturing jobs conducted in the wells of the same oilfield.
2 Methods
The main purpose was to obtain an approximation of the function mapping to , using the training set . Here
is a vector of
parameters which describes fracturing job and geology of a well,is an oil rate per day averaged for three months after corresponding HF. Having the loss function
the goal is to minimize estimation of expected value of the loss over the joint distribution of all
values. In practice we use an empirical distribution, i.e. we would like constructGradient boosting citation11 is one of the machine learning algorithms for regression and it proved itself to be robust, reliable and sufficiently accurate for engineering applications. Gradient boosting combines weak estimators into a single strong estimator in an iterative way:
The gradient boosting algorithm improves on by constructing new weak estimator and adding it to the general model with appropriate coefficient . The idea is to apply gradient descent and fit new weak estimator on gradient of loss function:
Finally we find with a simple onedimensional optimization
Trees of small depth are used as weak estimators. We tuned the number of iterations and maximal tree depth using crossvalidation. Gradient boosting over decision trees is known to produce a relatively high accuracy forecasts while operating with datasets having even significant amount of missing data, which is the case for this study.
We divided input parameters for our machine learning model into five groups:

General Information: Well number, fracturing job date, time when treatment started, zone, contractor (company), supervisor name (from contractor side), supervisor name (from client side), fracturing status (initial fracturing or refracturing), well status (new drill or old well), completion type (cemented or open hole), number of stages,

Job Parameters: flow rate, pad volume, total volume of gelled fluid pumped, maximal concentration of proppant, proppants’ (at 1st to 4th stage of frac job) manufacturers, proppants’ mesh sizes and volumes, datafrac pumped (Yes/No), fracture closure gradient (kPa/m), instantaneous shutin pressure gradient, fracture net pressure, maximal wellhead treating pressure, average wellhead treating pressure, actual  planned flush (0 if equal, 1 if positive, 1 if negative), screenout (Yes/No),

Fluid Parameters: gel type (Guar, HPG, …), gel loading (kg/m3), types and amounts of breakers, types and amounts of XLinkers used at different stages.

Calculated HF parameters: Estimated fracture height (m), Estimated fracture length (m), Estimated fracture width (mm).

Geological data: Clay factor (relative units), porosity (%), thickness of gas saturated part (m), reservoir thickness (m), thickness of the target interval (m), length of horizontal part of the wellbore (m), oil saturation (%), thickness of oilsaturated interval (m), sand content (%), permeability (mD), reservoir compartmentalization index, reservoir bottom depth (m).
The target value is oil rate per day averaged for three months after corresponding HF. One can see histograms of the selected parameters in appendix in Fig. 5.
The dataset contains many categorical parameters. To transform these parameters in a numeric form we used onehotencoding approach. This approach allows to transform a categorical parameter to a boolean vector. Each position in that vector is associated with corresponding unique category. Categorical parameter can be encoded as such vector with one at corresponding position and zeros in the others. After that we obtain a final input vector
as a concatenation of a vector with all available numeric parameters and all vectors which encode categorical features. To avoid dimensionality explosion this approach makes sense if categorical parameter has small number of unique values and so .All samples are split into the training set (80%) and the test set (20%). We fit the model using samples from the training set and calculate an average error using the test set. For more accurate generalization ability estimation, one can randomly divide samples in train/test subsets several times and average the error with respect to this divisions as well. In this paper we provide the average error estimated using 50 random splits. As the error measure we considered MAE and Pearson correlation coefficient, here
3 Results
Existing model  Gradient Boosting  

MAE  12.23  9.68 
PEARSON CORR. COEFF.  0.47  0.63 
Predictions of the existing empirical model for 20% of the cases are depicted in the left plot (see Fig. 2). In the right plot we depicted predictions of the gradient boosting model also for 20% of the samples, other 80% were used for model training. One can see that the machine learning model has much higher forecasting ability. This can be seen by comparing the average error and the correlation of predictions with the existing model (see Table 1).
Purely statistical observations can give some valuable insights about the data. For example, in Fig. 3 and Fig. 4 one can see the distribution of categorical features for either best (in terms of extra oil after the fracturing job) 20% fracs or worst 20% fracs. In Fig. 3 left plot shows that fracs with many stages are more successful than others. An interesting observation is that seven stage fracturing looks optimal in terms of gaining a maximal amount of extra oil and with ten stage fracturing one can expect no poorly producing wells. Right plot shows, for example, that with WGXL8.2 and DGXL10.1 Xlinker there are only the worst fracs and perhaps one should exclude them.
In Fig. 4 left plot shows that contractor C provides the most successful fracturing services and contractor A is the worst one in terms of relative quality. But the actual differences are relatively small and can be interpreted that success does not strongly depend on the contractor. Also in the right plot, one can see which proppant manufacturer is better. The names of proppant manufacturers and contractors were changed for reasons of confidentiality.
4 Conclusions and Discussion
One can spot (see Fig. 2), that the experiencebased model tends to overestimate production rates over the whole range of the actual flow rates, while the gradient boosting generates underestimates at the high flow rates. The gradient boosting behaves like this because it targets exactly minimizing the average error and there are very few samples of high flow rate cases within the training and validation sets. From the economical stand point, the authors believe that such performance of the gradient boosting based model is rather safe as it allows managing the expectations of an outcomes of a HF job in a conservative manner.
In the overall, the paper demonstrates usability and a very high potential of machine learning technologies as a tool for prediction of hydraulic fracturing efficiency. This is just a first effort of bringing the modern big data techniques to the well stimulation optimization. Authors believe that this is just an initial step in this directions. There are multiple ways of improvement of the existing models. They include an accurate assessment of prediction error e.g. using nonparametric confidence measures VovkConformal2014 ; ConformalKRR2016 , precise selection of objective function for optimization of the algorithms, comparing different training routines, applying smart algorithms for filling the gaps in the initial data and assessing the data quality, performing feature engineering, detailed comparison with other machine learning methods. It is rather obvious, that further improvement of datadriven forecasting algorithms and data collection systems will make machine learning a true gamechanger for the upstream technologies resulting in sufficient optimization of the technological and economical side of the processes generating real data.
Appendix
There are histograms for several selected variables in Fig. 5. One can see that in historical data, the HFs were made for several zones and with different contractors. The most of hydraulic fracturing jobs are singlestage. The remaining parameters are distributed without any significant anomalies.
References
 (1) Agarwal, R.G., Carter, R.D., Pollock, C.B.: Evaluation and performance prediction of lowpermeability gas wells stimulated by massive hydraulic fracturing. Journal of Petroleum Technology 31(03), 362–372 (1979)
 (2) Alestra, S., Kapushev, E., Belyaev, M., Burnaev, E., Dormieux, M., Cavailles, A., Chaillot, D., Ferreira, E.: Surrogate models for spacecraft aerodynamic problems. In: Proceedings of the joint WCCM  ECCM  ECFD 2014 Congress, 2025 July, Barcelona, Spain (2014)
 (3) Belyaev, M., Burnaev, E., Kapushev, E., Alestra, S., Dormieux, M., Cavailles, A., Chaillot, D., Ferreira, E.: Building data fusion surrogate models for spacecraft aerodynamic problems with incomplete factorial design of experiments. Advanced Materials Research 1016, 405–412 (2014)
 (4) Belyaev, M., Burnaev, E., Kapushev, E., Panov, M., Prikhodko, P., Vetrov, D., Yarotsky, D.: Gtapprox: Surrogate modeling for industrial design. Advances in Engineering Software 102, 29 – 39 (2016). DOI https://doi.org/10.1016/j.advengsoft.2016.09.001. URL http://www.sciencedirect.com/science/article/pii/S0965997816303696

(5)
Burnaev, E., Nazarov, I.: Conformalized kernel ridge regression.
In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 45–52 (2016). DOI 10.1109/ICMLA.2016.0017  (6) Burnaev, E., Vovk, V.: Efficiency of conformalized ridge regression. In: M.F. Balcan, V. Feldman, C. Szepesvári (eds.) Proceedings of The 27th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 35, pp. 605–622. PMLR, Barcelona, Spain (2014). URL http://proceedings.mlr.press/v35/burnaev14.html
 (7) Friedman, J.H.: Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 29(5), 1189–1232 (2001)
 (8) Fu, P., Johnson, S.M., Carrigan, C.R.: An explicitly coupled hydro‐geomechanical model for simulating hydraulic fracturing in arbitrary discrete fracture networks. Journal for Numerical and Analytical Methods in Geomechanics 37(14), 2278–2300 (2013)
 (9) Grihon, S., Burnaev, E., Belyaev, M., Prikhodko, P.: Surrogate modeling of stability constraints for optimization of composite structures. In: SurrogateBased Modeling and Optimization, pp. 359–391. Springer (2013)
 (10) Guo, G., Evans, R.D.: Inflow performance and production forecasting of horizontal wells with multiple hydraulic fractures in lowpermeability gas reservoirs. In SPE Gas Technology Symposium. Society of Petroleum Engineers. (1993)
 (11) Ji, L., Settari, A., Sullivan, R.B.: A novel hydraulic fracturing model fully coupled with geomechanics and reservoir simulation. SPE Journal 14(03), 423–430 (2009)
 (12) Kissinger, A., et al.: Hydraulic fracturing in unconventional gas reservoirs: risks in the geological system, part 2. Environmental earth sciences 70(8), 3855–3873 (2013)
 (13) Klyuchnikov, N., Zaytsev, A., Gruzdev, A., Ovchinnikov, G., Antipova, K., Ismailova, L., Muravleva, E., Burnaev, E., Semenikhin, A., Cherepanov, A., Koryabkin, V., Simon, I., Tsurgan, A., Krasnov, F., Koroteev, D.: Datadriven model for the identification of the rock type at a drilling bit. arXiv eprints arXiv:1806.03218 (2018)
 (14) Meng, H.Z., Brown, K.E.: Coupling of production forecasting, fracture geometry requirements and treatment scheduling in the optimum hydraulic fracture design. In Low Permeability Reservoirs Symposium. Society of Petroleum Engineers. (1987)

(15)
Mohaghegh, S.D.: Reservoir simulation and modeling based on artificial intelligence and data mining (AI&DM).
Journal of Natural Gas Science and Engineering 3(6), 697–705 (2011)  (16) Osiptsov, A.A.: Fluid mechanics of hydraulic fracturing: A review. Journal of Petroleum Science and Engineering (2017)
 (17) Sterling, G., Prikhodko, P., Burnaev, E., Belyaev, M., Grihon, S.: On approximation of reserve factors dependency on loads for composite stiffened panels. Advanced Materials Research 1016, 85–89 (2014)
 (18) Sudakov, O., Burnaev, E., Koroteev, D.: Driving Digital Rock towards Machine Learning: predicting permeability with Gradient Boosting and Deep Neural Networks. arXiv eprints arXiv:1803.00758 (2018)
 (19) Sun, J., Schechter, D., Huang, C.K.: Sensitivity analysis of unstructured meshing parameters on production forecast of hydraulically fractured horizontal wells. In Abu Dhabi International Petroleum Exhibition and Conference. Society of Petroleum Engineers. (2015)
 (20) Temirchev, P., Simonov, M., Kostoev, R., Burnaev, E., Oseledets, I., Akhmetov, A., Margarit, A., Sitnikov, A., Koroteev, D.: Deep Neural Networks Predicting Oil Movement in a Development Unit. arXiv eprints arXiv:1901.02549 (2019)
Comments
There are no comments yet.