1 Introduction
Load or electricity demand forecasting is an essential tool for power system operation and planning. Midterm electrical load forecasting (MTLF) involves forecasting the daily peak load for future months as well as monthly electricity demand. MTLF is necessary for maintenance scheduling, fuel reserve planning, hydrothermal coordination, electrical energy import/export planning, and security assessment. Deregulated power systems need MTLF to be able to negotiate forward contracts. Therefore, the forecast accuracy translates directly into financial performance for energy market participants.
Methods of MTLF can be divided into a conditional modeling approach and an autonomous modeling approach [1]. The former focuses on economic analysis and longterm planning of energy policy and uses input variables describing socioeconomic conditions, population migrations, and power system and network infrastructure. The latter uses input variables including only historical loads or, additionally, weather factors [2], [3].
For MTLF the classical statistical/econometrics tools are used as well as machine learning tools
[4]. The former include ARIMA, exponential smoothing (EST) and linear regression
[5]. Problems with adaptability and nonlinear modeling of the statistical methods have increased researchers’ interest in machine learning and AI tools [6]. The most popular representatives of these tools are neural networks (NNs) which have very attractive features such as learning capabilities, universal approximation property, nonlinear modeling, and massive parallelism [7]. Among the other machine learning models for MTLF, the following can be mentioned: long shortterm memory
[8], weighted evolving fuzzy NNs [2][9], and pattern similaritybased models [10].In recent years ensemble learning has been widely used in machine learning. Ensemble learning systems are composed of many base models. Each of them provides an estimate of a target function. These estimates are combined in some fashion to produce a common response, hopefully improving accuracy and stability compared to a single learner. The base models can be of the same type (singlemodel or homogeneous ensemble) or of different types (multimodel or heterogeneous ensemble). The key issue in ensemble learning is ensuring diversity of learners
[11]. A good tradeoff between performance and diversity underlies the success of ensemble learning. The source of diversity in the heterogeneous case is a different nature of the base learners. Some experimental results show that heterogeneous ensembles can improve accuracy compared to homogenous ones [12]. This is because the error terms of models of different types are less correlated than the errors of models of the same type. Generating diverse learners which give uncorrelated errors in a homogeneous ensemble is a challenging problem. Diversity can be achieved through several strategies. One of the most popular is learning on different subsets of the training set or different subsets of features. Other common approaches include using different values of hyperparameters and parameters of learners. In the field of forecasting, it was shown that ensembling of the forecasts enhances the robustness of the model, mitigating the model and parameter uncertainty
[13].In this work we build heterogeneous and homogeneous ensembles for MTLF using pattern similaritybased forecasting models (PSFMs) [10] as base learners. PBSMs turned out to be very effective models (accurate and simple) for both mid and shortterm load forecasting [14], [15]. In this study, we investigate what profit we will achieve from an ensembling of the forecasts generated by PBFMs. For heterogeneous ensemble, we employ nearest neighbor model, fuzzy neighborhood model, kernel regression model, and general regression neural network. For homogeneous ensemble, we employ a fuzzy neighborhood model and generate its diversity using five strategies.
The remainder of this paper is structured as follows. In Section 2, we present pattern representation of time series, a framework of the pattern similaritybased forecasting, and PSFMs. In Section 3, we describe heterogeneous and homogeneous ensemble forecasting using PSFMs. Section 4 shows the setup of the empirical experiments and the results. Finally, Section 5 presents our conclusion.
2 Pattern Similaritybased Forecasting
2.1 Pattern Representation of Time Series
Monthly electricity demand time series express a trend, yearly cycles, and random component. To deal with seasonal cycles and trends in our earlier work, we proposed similaritybased models operating on patterns of the time series sequences [10], [14]. The patterns filter out the trend and those seasonal cycles longer than the basic one and even out variance. They also ensure the unification of input and output variables. Consequently, pattern representation simplifies the forecasting problem and allows us to use models based on pattern similarity.
Input pattern is a vector of predictors representing a sequence of successive time series elements (monthly electricity demands) preceding a forecasted period. In this study we use the following definition of xpattern components:
(1) 
where , is a mean of sequence , and is a measure of its dispersion.
The xpattern defined using (1) is a normalized vector composed of the elements of sequence . Note that the original time series sequences having different mean and dispersion are unified, i.e. they are represented by xpatterns which all have zero mean, the same variance and also unity length.
Output pattern represents a forecasted sequence of length : . The output pattern is defined similarly to the input one:
(2) 
where , and and are coding variables described below.
Two variants of the output patterns are considered. In the first one, denoted as V1, the coding variables, and , are the mean and dispersion, respectively, of the forecasted sequence . But in this case, when the forecasted sequence of the monthly electricity demands is calculated from the forecasted ypattern, , using transformed equations (2):
(3) 
the coding variables are not known, because they are the mean and dispersion of future sequence , which has just been forecasted. In this case, the coding variables are predicted from their historical values. In the experimental part of the work, the coding variables are predicted using ARIMA and ETS.
To avoid forecasting the coding variables we use another approach. Instead of using the mean and dispersion of the forecasted sequence as coding variables, we introduce in (2) and (3) as coding variables the mean and dispersion of sequence , i.e. , . When the PSFM generates the forecasted ypattern, the forecast of the monthly demands are calculated from (3) using known coding variables for the historical sequence . This variant of the ypattern definition is denoted as V2.
2.2 Forecasting Models
Pattern similaritybased forecasting procedure can be summarized in the following steps [10]:

Mapping the original time series sequences into x and ypatterns.

Selection of the training xpatterns similar to the query pattern .

Aggregation of the ypatterns paired with the similar xpatterns to obtain the forecasted pattern .

Decoding pattern to get the forecasted time series sequence .
In step 3, ypatterns are aggregated using weights which are dependent on the similarity between a query pattern and the training xpatterns. The regression model mapping xpatterns into ypatterns is of the form:
(4) 
where , is a weighting function.
Model (4) is nonlinear if maps nonlinearly. Different definitions of are presented below where the PSFMs are specified.
2.2.1 Nearest Neighbor Model
estimates as the weighted average of the ypatterns in a varying neighborhood of query pattern (this model is denoted as NNw). The neighborhood is defined as a set of nearest neighbors of in the training set . The regression function is as follows:
(5) 
where is a set of indices of nearest neighbors of in and the weighting function is of the form [15]:
(6) 
(7) 
where is the th nearest neighbor of in , is a Euclidean distance between and its th nearest neighbor, is a parameter deciding about the differentiation of weights, and is a parameter deciding about a convexity of the weighting function.
2.2.2 Fuzzy Neighborhood Model
(FNM) takes into account all training patterns when constructing the regression surface
[16]. In this case, all training patterns belong to the query pattern neighborhood, with a different membership degree. The membership function is dependent on the distance between the query pattern and the training pattern as follows:(8) 
where and are parameters deciding about the membership function shape.
The weighting function in FNM is as follows:
(9) 
Membership function (8) is a Gaussiantype function. The model parameters, and , shape the membership function and thus control the properties of the estimator.
2.2.3 NadarayaWatson Estimator
(NWE) estimates regression function as a locally weighted average, using in (4) a kernel as a weighting function:
(10) 
When the input variable is multidimensional, the kernel has a product form. In such a case, for a normal kernel, which is often used in practice, the weighting function is defined as [15], [14]:
(11) 
where is a bandwidth for the th dimension.
The bandwidths decide about the biasvariance tradeoff of the estimator.
2.2.4 General Regression Neural Network Model
(GRNN) is composed of four layers: input, pattern (radial basis layer), summation and output layer [17]
. The pattern layer transforms inputs nonlinearly using Gaussian activation functions of the form:
(12) 
where is a Euclidean norm and is a bandwidth for the th pattern.
The Gaussian functions are centered at different training patterns
. The neuron output expresses a similarity between the query pattern and the
th training pattern. This output is treated as the weight of the th ypattern. So the pattern layer maps the dimensional input space into dimensional space of similarity, where is a number of training patterns. The weighting function implemented in GRNN is defined as:(13) 
The performance of PSFMs is related to the weighting function parameters governing the smoothness of the regression function (4). For wider weighting function the model tends to increase bias and decrease variance. Thus, too wide weighting function leads to oversmoothing, while too narrow weighting function leads to undersmoothing. The PSFM parameters should be adjusted to the target function.
3 Ensemble Forecasting using PSFMs
Two approaches for ensemble forecasting are used: heterogeneous and homogeneous. The former consists of different base models, while the latter consists of a singletype base model. In the heterogeneous approach, we use the PSFMs described above as base models. A diversity of learners, which is the key property that governs an ensemble performance, in this case, results from different types of learners.
To control the diversity in the homogeneous approach we use the following strategies [18]:

Learning on different subsets of the training data. For each ensemble member a random training sample without replacement of size is selected from the training set .

Learning on different subsets of features. For each ensemble member the features are randomly sampled without replacement. The sample size is . In this case, the optimal model parameters may need correction for ensemble members due to a reduction in Euclidean distance between xpatterns in dimensional space relative to dimensional space.

Random disturbance of the model parameters. For FNM the initial value of width is randomly perturbed for th member by a Gaussian noise: , where .

Random disturbance of xpatterns. The components of xpatterns are perturbed for th member by a Gaussian noise: , where .

Random disturbance of ypatterns. The components of ypatterns are perturbed for th member by a Gaussian noise: , where .
Standard deviations of the noise signals, , control the noise level and are selected for each forecasting task as well as and .
The first strategy controlling diversity is similar to bagging [19], where the predictors are built on bootstrapped versions of the original data. In bagging, unlike our approach, the sample size is and the random sample is drawn with replacement. The second strategy is inspired by the random subspace method [20]
which is successfully used to construct random forests, very effective treebased classification and regression models. Note that the diversity of learners has various sources. They include data uncertainty (learning on different subsets of the training set, learning on different features of xpatterns, learning on disturbed input and output variables) and parameter uncertainty.
The forecasts of ypatterns generated by base models, , are aggregated using simple averaging to obtain an ensemble forecast:
(14) 
In this study we use the mean for aggregation, but also other functions, such as median, mode, or trimmed mean could be used. As shown in [21] a simple average of forecasts often outperforms forecasts from single models and a more complicated weighting scheme does not always perform better than a simple average.
4 Simulation Study
In this section, we apply the proposed ensemble forecasting models to midterm load forecasting using realworld data: monthly electricity demand time series for 35 European countries. The data are taken from the publicly available ENTSOE repository (www.entsoe.eu). The time series differ in levels, trends, variations and yearly shapes. They differ also in a length, i.e. they cover: 24 years for 11 countries, 17 years for 6 countries, 12 years for 4 countries, 8 years for 2 countries, and 5 years for 12 countries. The models forecast for the twelve months of 2014 (last year of data) using data from the previous period for training.
We built four heterogeneous ensembles:
 Ensemble1

composed of PSFMs described in Section 3, i.e. NNw, FNM, NWE and GRNN, which are trained on the ypatterns defined with the coding variables determined from the historical sequence (ypattern definition V2).
 Ensemble2

composed of PSFMs which are trained on the ypatterns defined with the coding variables predicted for the forecasted sequence using ARIMA (ypattern definition V1). The base models, in this case, are denoted as NNw+ARIMA, FNM+ARIMA, NWE+ARIMA and GRNN+ARIMA.
 Ensemble3

composed of PSFMs which are trained on the ypatterns defined in the same way as for Ensemble2, but the coding variables are predicted using ETS. The base models, in this case, are denoted as NNw+ETS, FNM+ETS, NWE+ETS and GRNN+ETS.
 Ensemble4

composed of all variants of PSFM models mentioned above for Ensemble1, Ensemble2 and Ensemble3, i.e. twelve models.
For prediction the coding variables we used the ARIMA and ETS implementations in R statistical software environment: functions auto.arima and ets from the forecast package. These functions implement automatic ARIMA and ETS modeling, respectively, and identify optimal models estimating their parameters using Akaike information criterion (AICc) [22].
The optimal values of hyperparameters for each PSFM were selected individually for each of 35 time series in the grid search procedure using crossvalidation. These hyperparameters include: length of the xpatterns , number of nearest neighbors in NNw (linear weighting function was assumed with and ), width parameter in FNM (we assumed ), bandwidth parameters in NWE, and bandwidth in GRNN.
The forecasting errors on the test sets (mean absolute percentage error, MAPE) for each model and each country are shown in Fig. 1 and their averaged values are shown in Table 1. In Table 1 also median of the absolute percentage error (APE), interquartile ranges of APE and root mean square error (RMSE) averaged over all countries are shown. The forecasting accuracy depends heavily on the variant of the coding variables determination. The most accurate variant on average is V1+ETS and the least accurate is variant V2 where coding variables are determined from history.
Fig. 2 shows the ranking of the models based on MAPE. The rank is calculated as the average rank of the model in the rankings performed individually for each country. As you can see from this figure, the Ensemble4 and Ensemble3 models were the most accurate for the largest number of countries. Model NWE took the third position. Note that ensemble combining the group of PSFMs (V1+ARIMA, V1+ETS or V2) occupies a higher position in the ranking than individual members of this group. The exception is NWE which achieves better results than Ensemble1. A similar conclusion can be drawn from the ranking based on RMSE.
Model  Median APE  MAPE  IQR  RMSE 

kNNw  2.89  4.99  4.06  368.79 
FNM  2.88  4.88  4.43  354.33 
NWE  2.84  5.00  4.14  352.01 
GRNN  2.87  5.01  4.30  350.61 
Ensemble1  2.88  4.90  4.13  351.89 
kNNw+ARIMA  2.89  4.65  4.02  346.58 
FNM+ARIMA  2.87  4.61  3.83  341.41 
NWE+ARIMA  2.85  4.59  3.74  340.26 
GRNN+ARIMA  2.81  4.60  3.77  345.46 
Ensemble2  2.90  4.60  3.84  342.43 
kNNw+ETS  2.71  4.47  3.43  327.94 
FNM+ETS  2.64  4.40  3.34  321.98 
NWE+ETS  2.68  4.37  3.20  320.51 
GRNN+ETS  2.64  4.38  3.35  324.91 
Ensemble3  2.64  4.38  3.40  322.80 
Ensemble4  2.70  4.31  3.49  327.61 
The homogeneous ensembles were built using FNM in variant V2 as a base model. Five strategies of diversity generation described in Section 3 were applied. Ensembles constructed in this way are denoted as FNMe1, FNMe2, …FNMe5. In the FNMe2 case, where the diversity is obtained by selection xpattern components, the optimal width parameter (selected for a single FNM) is corrected for ensemble members by the factor . This is due to a reduction in Euclidean distance between xpatterns in dimensional space relative to the original dimensional space.
The forecasts were generated independently by each of ensemble members. Then the forecasts were combined using (14). The following parameters of the ensembles were selected on the training set using a grid search:

size of the random sample of training patterns in FNMe1: ,

size of the random sample of features in FNMe2: ,

standard deviation of the disruption of width parameter in FNMe3: ,

standard deviation of the disruption of xpatterns in FNMe4: ,

standard deviation of the disruption of ypatterns in FNMe5: .
Table 2 shows the results for FNM ensembles. It can be seen from this table that the errors for different sources of diversity are similar. It is hard to indicate the best strategy for member diversification. When comparing results for FNM ensembles and single base model FNM (see Table 1), we can see a slightly lower MAPE for ensembles with the exception of FNMe5 where MAPE is the same as for FNM. But RMSE is higher for ensemble versions of FNM than for single FNM.
Fig. 3 shows the ranking of the FNM ensembles based on MAPE. The ensembles FNMe1FNMe4 are more accurate then FNM for most countries. Ensemble FNMe5 turned out to be less accurate than FNM. A similar conclusion can be drawn from the ranking based on RMSE.
Model  Median APE  MAPE  IQR  RMSE 

FNMe1  2.88  4.84  4.18  370.52 
FNMe2  2.85  4.84  4.06  366.35 
FNMe3  2.80  4.83  4.23  371.94 
FNMe4  2.90  4.86  4.10  373.73 
FNMe5  2.97  4.88  4.18  375.84 
5 Conclusion
Ensemble forecasting is widely used for improving the forecast accuracy over the individual models. In this work, we investigate singlemodel and multimodel ensembles based on patternsimilarity forecasting models for midterm electricity demand forecasting. The key issue in ensemble learning is ensuring the diversity of learners. The advantage of heterogeneous ensembles is that the errors of the base models are to be weakly correlated because of the different nature of the models. But in our case the PSFMs are similar in nature, so we can expect error correlation. The results of simulations do not show a spectacular improvement in accuracy for homogeneous ensemble comparing to its members. However, the ranking shown in Fig. 2 generally confirms better results for ensembles than for their members.
In homogeneous ensembles, we can control a diversity level of members. We propose five strategies for this including strategies manipulating training data and model parameters. Among them, strategies based on learning on different subsets of training data and different subsets of features turned out to be most effective.
References
 [1] Ghiassi, M., Zimbra, D. K., Saidane, H.: Medium term system load forecasting with a dynamic artificial neural network model. Electric Power Systems Research 76, 302–316 (2006)
 [2] PeiChann, C., ChinYuan, F., JyunJie, L.: Monthly electricity demand forecasting based on a weighted evolving fuzzy neural network approach. Electrical Power and Energy Systems 33, 17–27 (2011)

[3]
Pełka, P., Dudek, G.: Patternbased forecasting monthly electricity demand using multilayer perceptron. In: Proc. Conf. Artificial Intelligence and Soft Computing ICAISC 2019, vol. LNAI 11508, pp.663–672. Springer, Cham (2019)
 [4] Suganthi, L., Samuel, A. A.: Energy models for demand forecasting — A review. Renew Sust Energy Rev, 16(2), 1223–1240 (2002)
 [5] Barakat, E. H.: Modeling of nonstationary timeseries data. Part II. Dynamic periodic trends. Electrical Power and Energy Systems, 23, 63–68 (2001)
 [6] GonzálezRomera, E., JaramilloMorán, M. A, CarmonaFernández, D.: Monthly electric energy demand forecasting with neural networks and Fourier series. Energy Conversion and Management 49, 3135–3142 (2008)

[7]
Chen, J. F., Lo, S .K., Do, Q. H.: Forecasting monthly electricity demands: An application of neural networks trained by heuristic algorithms. Information
8(1), 31 (2017) 
[8]
Bedi, J., Toshniwal, D.: Empirical mode decomposition based deep learning for electricity demand forecasting. IEEE Access
6, 4914449156 (2018)  [9] Zhao, W., Wang, F., Niu, D.: The application of support vector machine in load forecasting. Journal of Computers 7(7), 16151622 (2012)
 [10] Dudek, G.: Pattern similaritybased methods for shortterm load forecasting – Part 1: Principles. Applied Soft Computing, 37, 277–287 (2015)
 [11] Brown, G., Wyatt, J. L., Tino, P.: Managing diversity in regression ensembles. Journal of Machine Learning Research, 6, 1621– 1650 (2005)
 [12] Wichard, J., Merkwirth, C., Ogorzałek, M.: Building ensembles with heterogeneous models. In: Course of the International School on Neural Nets (2003)
 [13] Petropoulos, F., Hyndman, R. J., Bergmeir, C.: Exploring the sources of uncertainty: Why does bagging for time series forecasting work?. European Journal of Operational Research, 268(2), 545–554 (2018)
 [14] Dudek, G., Pełka, P.: Mediumterm electric energy demand forecasting using NadarayaWatson estimator. In: Proc. IEEE 18th Int. Conf. Electric Power Engineering EPE’17, pp. 1–6 (2017)
 [15] Dudek, G.: Pattern similaritybased methods for shortterm load forecasting – Part 2: Models. Applied Soft Computing 36, 422441 (2015)
 [16] Pełka, P., Dudek, G.: Prediction of monthly electric energy consumption using patternbased fuzzy nearest neighbour regression. In. Proc. Conf. Computational Methods in Engineering Science CMES’17, ITM Web Conf., vol. 15, pp. 15 (2017)
 [17] Pełka, P., Dudek, G.: MediumTerm Electric Energy Demand Forecasting Using Generalized Regression Neural Network. In: Proc. 39th Conf. Information Systems Architecture and Technology ISAT’18, AISC 853, pp. 218–227, Springer, Cham (2019)
 [18] Dudek, G.: Ensembles of general regression neural networks for shortterm electricity demand forecasting. In: Proc. IEEE 18th Int. Conf. Electric Power Engineering EPE’17, pp. 1–5 (2017)
 [19] Breiman, L.: Bagging predictors. Machine Learning, 24(2), 123–140 (1996)
 [20] Ho, T. K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832844 (1998)
 [21] Chan, F., Pauwels, L. L.: Some theoretical results on forecast combinations. International Journal of Forecasting, 34(1), 64–74 (2018)
 [22] Hyndman, R. J., Athanasopoulos, G.: Forecasting: principles and practice. 2nd edition, OTexts: Melbourne, Australia (2018). OTexts.com/fpp2 Accessed on 4 October 2019
Comments
There are no comments yet.