1 Introduction
Compared to other buildings, supermarkets consume proportionately more energy [33, 25]. This is mainly due to refrigeration needed to slow down the deterioration of food, by retaining them on a predetermined temperature [25]. Electricity costs associated with refrigeration accounts for a large part of the operating costs because these machines are continually utilizing energy, day and night. As a result, costs associated with refrigerator equipment can represent more than 50% of the total energy costs [13, 12, 29, 33]. Retailers operate in an industry that is characterized as competitive and lowmargin [13]. If they are able to become more energy efficient this can make them more competitive. This outlines the importance of operating the system at its optimum performance level so the associated energy costs can be reduced.
Energy baselining makes it possible to analyze the energy consumption by comparing it to a reference behavior [24]
. Furthermore, it can be used to measure the effectiveness of energy efficiency policies by monitoring energy usage over time. Changes in energy policies, such as retrofitting the equipment, can require high investments. This makes it important for a retailer to know if the investments are truly effective, in the reduction of energy consumption. To estimate energy savings with reasonable accuracy, the energy baselines need to be accurate. It can be challenging to estimate the quality of these energy baselines. One way is to run the old policies in parallel with the new ones, which is often impossible. Determining the quality of these baselines can yield significant results for supermarkets.
The objective of this work is to develop energy baselines using offtheshelf data science technologies. Different technologies will be tested and applied on the data obtained from several supermarkets to test their performance. Fives supermarkets in Portugal, will be analyzed as a case study with a methodology based on energy baselining.
2 Background
The characteristics of the foodretail industry, such as fierce competition and low margins, makes retailers continually search for ways to operate more efficiently [13]. Since energy costs are the second highest costs for a retailer [9], a decent energy management process is vital for improving efficiency [31].
Energy Management (EM) has been the subject of numerous studies throughout the years, and, because the field of EM is wide, it can be described in many different ways [31]. A purpose of EM is to search for improved strategies to consume energy in a more efficient way. From a business point of view, greater energy efficiency is of importance because it provides a number of direct, and indirect, economic benefits [38].
Several reasons can keep companies from investing in energy efficiency measures [14]. For example, when inadequate information is available about the results of these investments, this can limit companies to invest in them [14]. Energy management can focus on addressing these factors to enable businesses to invest. In order to evaluate the efficiency an energy efficiency measure the observed energy consumption of the store/system must be compared to a reference behavior [24]. One way to create this reference behavior is to use energy baselining, here the reference behavior is defined as the previous, historically best, or ideal, theoretical performance of the given store [12]. Energy baselines are usually created on the analysis of historical data [24] and can be developed using traditional data mining techniques.
Timeseries prediction is a method of forecasting future values based on historical data [8] In time series forecasting, forecasts are made on the basis of data comprising one or more time series [7]. Time series data are defined as the sort of data that is captured over a period of time [15] (Eq. 1).
(1) 
Where is the value measured at time . Creating energy forecasts is an important aspect of the energy management of buildings [35]. Finally, making forecasts can also help in model evaluation when testing different time series algorithms [7].
We want to be able to use domainspecific knowledge to engineer new features, therefore, we decided to follow a regression approach. Regression is not a time series specific algorithm for forecasting, however, it can be applied to make time series forecasts. In multiple regression models, we forecast the dependent variable using a linear combination of the independent variables. Based on this relationship the algorithm will be able to predict a value for the dependent variable.
We selected offtheshelf machine learning algorithms like Multiple Linear Regression (MLR), Random Forests (RF) and Artificial Neural Networks (ANN) to perform the regression. One way to test the accuracy of the algorithms, is to compare the predicted values with the actual observed values.
Nowadays, Machine Learning models and methods are applied in various areas and are used to make important decisions which can have farreaching consequences [2]. Therefore, it is important to evaluate their performance. Currently, CrossValidation (CV) is the widely accepted and most used evaluating technique in data analysis and machine learning [19, 2]. However, Cross Validation does not work well in evaluating the predictive performance of time series [19]. One way to validate the prediction performance of a time series model is to make use of a Sliding Window design [17], (Figure 1). In this method, the algorithm is trained and tested in different periods of time.
To evaluate the prediction performance of the algorithms we used the Mean Absolute Error (MAE) as the error metric because the MAE is the most natural measure of the average prediction error [36, 37]. The following formula shows how the Mean Absolute Error is calculated:
(2) 
Here is the predicted value and is the observed value.
Numerous studies focused on energy prediction because forecasting the energy consumption is an important component of any energy management system [26]. In New Zealand [30], researchers used MLR to calculate the optimal energy usage level for office buildings, based on monthly outside temperatures and numbers of fulltime employees. With this knowledge, they could build an energy monitoring and auditing system for the optimization and reduction of energy consumption. In the UK [23], researchers used an MLR to forecast the expected effect of climate change on the energy consumption of a specific supermarket. They estimated that, until 2040, the gas consumption will increase 28%, which is more, compared to the electricity usage, which will increase 5,5%.
In the UK, most supermarkets negotiate energy prices and, when they exceed their predicted demand, they have to pay a penalty. Therefore, their ability to accurately predict energy consumption will facilitate their negotiations on electricity tariffs with suppliers. One supermarket in the UK used ANN’s to analyze the Store’s Total Electricity Consumption as well as their individual systems, such as Refrigeration and Lighting [12]. For each of these systems, they developed a model to provide an energy baseline. This baseline is used for performance monitoring which is vital to ensure systems to perform adequately and guarantee operating costs and energy use are kept to a minimum. Finally, ANN’s have been used for energy prediction with the final goal of estimating the supermarkets future CO2 emissions [6].
A recent paper [35]
, provides a detailed literature review on the stateoftheart developments of Artificial Intelligent (AI) based models for building energy use prediction. It provides insight into ensemble learning, which combines multiple AIbased models to improve prediction accuracy. The paper concludes that ensemble methods have the best prediction accuracy but that a high level of technical knowledge and computational resources is required to develop them. Consequently, this has hindered their application in real practice. An advantage of high prediction accuracy is that this can allow early detection of equipment faults that could disrupt store operations
[12].These studies show that predicting energy consumption is possible with data mining techniques and that they can predict energy usage within acceptable errors. Compared to other engineering methods, ensemble methods require less detailed information of the physical building parameters [35]. This saves money and time in conducting predictions compared to simulation tools. Hence, they could replace them in the future. Because studies use different types and volumes of input data, there is no unified input data format. Therefore, knowledge of the methods and a variety of data is needed to create meaningful and accurate predictions.
3 Defining baselines with Machine Learning Algorithms
Every forecast of an observed value will have a forecast error , which describes the deviation among them. These deviations can result from poor prediction performance or energy savings/losses. It is very hard to forecast a numeric value correctly, the deviations can be larger or smaller. Thus, to provide good estimates of the effect of changes in energy management policies, it is important to have a learning model that can create energy baselines as accurate as possible.
The objective of this study is to asses the reliability of the learning model in different aspects. First, we want to determine which model is best in creating a reliable baseline with the least amount of training days. This can be beneficial in two specific situations: when a retailer opens a new store, or implements new energy policies. When a new store is opened, no data has been collected about the energy performance of this specific store. To create a baseline as soon as possible, it is essential to know how many days it takes to collect sufficient data. Therefore, we study the minimum amount of days needed to create a reliable baseline. This information is also suitable for updating the baseline when the configuration of the store changes, e.g., due to upgrades of the refrigeration equipment.
When we know this setup, we want to discover the lifespan of this prediction, i.e., how long does this energy baseline remain reliable after being learned. It is important to determine how reliable the baseline is and if it needs updating, because we expect that the prediction error will grow over time. As a result, the prediction error will behave differently for short and long term predictions. With this information, the lifecycle of a model can be determined, which defines how often the model needs to be updated.
When a new energy saving policy is implemented, the Retailer wants to estimate how much energy is saved. Therefore, a model has to be developed which is able to make long term predictions based on the old configuration of the store. With this baseline, the Retailer can see what the estimated energy consumption would be if they did not change the layout. By comparing this baseline with the observed energy consumption or the new baseline, the difference can be estimated. We will examine the behavior of the model for long term predictions because the Retailer needs to know for how long he can estimate, with a reasonable accuracy, the energy gains from a certain energy policy.
3.1 Approach
We obtained time series data from five supermarkets across Portugal, which consist of measurements of the Refrigeration Energy Consumption, Outside temperature and the Timestamp. The original time series data was provided, in sometimes irregular, 15minute intervals. After this restructuring, the data is converted into hourly values and eventually, transformed to daily formats. The energy consumption is measured in kilowatt hour (kWh) from the Retailer’s energy monitoring system. The weather data consists of the outside temperature derived from a sensor placed on the roof of the store and is measured in degrees Celsius (C).
In order to apply a similar approach to the data of each store, we decided to work separately with datasets that have a similar structure. We will use domain knowledge to create features for the datasets. The process of designing new features, based on domain knowledge, is called Feature engineering [22]. Before creating these datasets, we first identified the dependent and the independent variables. In this study, an energy baseline will be created that reflects the estimated refrigeration energy consumption. Consequently, this will be the dependent variable, and the independent variables are the ones influencing this consumption. Only the factors that are measured, by all stores, can be used here as an independent variable.
3.2 Estimating Reliability
For a retailer it is important to estimate, with reasonable accuracy, the energy savings resulting from energy policies. If we train an algorithm with data before a energy policy change, we can create an energy baseline that shows what the energy consumption would be if this policy has not been changed. By comparing this energy baseline with the observed consumption, after the policy change, we can estimate the energy savings.
The first objective of this study is to define the minimal set of training examples needed to build a reliable energy baseline. To do this, we train the machine learning algorithms with different numbers of training days. Each iteration we increase the number of training examples and evaluated the models’ prediction accuracy. When all iterations have been completed, we are ready to plot the error metrics in the learning curves. Because this approach is replicated for the three algorithms, this also reveals which one performs best.
After we selected the learning model which is able to create the baseline with the least amount of data, we define the update frequency of this setup. We expect the prediction error to grow over time, and therefore the energy baseline will become unreliable at some point when the prediction error becomes too high. To find the point of which we recommend updating, we use the previously defined setup, to make predictions for the remaining dataset. As soon as the predictions are made, we compute a MAE for each of 10 subsequent predictions. Once all the errors are computed, we can plot them to see how the prediction error develops over time. This enables us to analyze how the prediction accuracy develops along the prediction horizon, and define the update frequency.
Finally, the third part of this research is to analyze the long term prediction performance. This was done by training each model with various sizes of training data and let it predict for the remaining dataset. After the predictions were made, we then calculated a MAE for every 10 subsequent predictions. Having plotted the error metrics meant that we could study their performance over time.
4 Experimental Setup
In order to study the three objectives described before, we designed an approach based on Learning curves in combination with Sliding windows. Our experimental setup is a variation of the Time series approach used by [4, 34]. The method we propose is visualized in Figure 3. We decided to use this particular method because we want to train machine learning models with different sizes of historical training data. The learning curves enable us to visualize and evaluate their performance.
4.1 Data
The studied datasets are mainly based on the energy consumption and weather data for the whole year of 2016 and the first half of 2017 (Table 1
). The data for each store is available from the moment the store opened or started to collect the data. Hence, for each store, the maximum amount of data is available.
Store  First day  Last day  Observations 

Aveiro  04/12/2015  26/04/2017  510 days 
Fatima  07/01/2016  26/04/2017  476 days 
Macedo de Cavaleiros  13/11/2015  26/04/2017  531 days 
Mangualde  16/05/2016  16/05/2017  366 days 
Regua  16/05/2016  16/05/2017  366 days 
Based on the two available variables, Timestamp and Outside temperature, we created new features with additional information that the algorithm can use. Designing appropriate features is one of the most important steps to create good predictions because they can highly influence the results that will be achieved with the learning model [32]. To determine which features to create, knowledge about the behavior of the store is important [12]. The domain knowledge required for this process, was acquired through conversations with experts, reviewing similar studies [12, 23, 30, 6, 20, 18, 27] and using descriptive data mining techniques, e.g., Subgroup Discovery (SD). SD is a method to identify, unusual, behaviors between dependent and independent variables in the data [1, 16]. In this study, SD will be used to improve our understanding of the behavior of the energy consumption. Table 2 gives an overview of the created features.
Name  Type  Description  Derived from 

Weekday  Categorical (17)  Day of the week  Timestamp 
Week of the Month  Categorical (14)  Week of the Month  Timestamp 
Workday  Binary (01)  Workday or Weekend  Timestamp 
Max Temperature  Numerical  Max Temperature of the Day  Temperature 
Mean Temperature  Numerical  Mean Temperature of the Day  Temperature 
Min Temperature  Numerical  Min Temperature of the Day  Temperature 
Temperature Amplitude  Numerical  Absolute Difference Min and Max  Temperature 
Max Temperature Y..  Numerical  Max Temperature of Yesterday  Temperature 
Mean Temperature Y..  Numerical  Mean Temperature of Yesterday  Temperature 
Min Temperature Y..  Numerical  Min Temperature of Yesterday  Temperature 
Temperature Amplitude Y..  Numerical  Absolute Difference Min and Max  Temperature 
4.2 Algorithms
We selected offtheshelf machine learning algorithms like Multiple Linear Regression (MLR), Random Forests (RF) and Artificial Neural Networks (ANN) to perform the regression.
Linear regression is a simple and widely used statistical technique for predictive modeling [23]. It has been used before to predict the future energy consumption of a supermarket in the UK [23]. The RF is considered to be one of the most accurate generalpurpose learning techniques available and is popular because of its good offtheshelf performance [10, 3]. Finally, Artificial Neural Networks have successfully been used in recent studies to predict energy consumption [12, 6, 35, 20, 26, 11].
4.3 Performance Estimation
In Machine Learning, learning curves are used to reflect the predictive performance as a function of the number of training examples [28]. Figure 2 reveals the developing learning ability of a model when the number of training examples increases. The curve indicates how much better the model gets in predicting when more training examples are used. The general idea is to find out how good the model can become in predicting and what the subsequent number of training examples is [28]. Since we are searching for the minimum number of training days to create a baseline, we can use the learning curves to identify this number.
To test the learning ability of a model one can create several training sets of data and evaluate their performance on a test set [21]. These training sets can differ in, e.g., volume. It is preferred that the data for these sets are randomly selected from the available data [21]. The purpose is to train the model multiple times, and after every training, the model performance should be tested. The results of these tests can be plotted to draw a learning curve which shows the evolution in the performance of the model. These curves can be clarifying, especially when the performance of multiple models is compared. Besides for model selection, also the performance of a model can be compared in relation to the number of training examples used [28]. Such a learning curve will tell how the model behaves when it is constructed with varying volumes of training data.
5 Results
5.1 Reliability of baselines
In Figure 4, we see how the error evolves as we train the model with more data points i.e. days. This plot displays the learning curves obtained for each of the trained models, MLR, ANN, and RF. The number of training examples ranged from 10 up to 180 days, with threads of 10, and have been tested for a period of 50 days. Each line represents the mean of 18 iterations, for all stores, Aveiro, Fatima, and Macedo Cavaleiros, we performed six iterations regarding the method visualized in 3.
In Figure 4, we observe that the MLR is the most reliable by a number of 30 days with a MAE of 0.25. Besides, we observe that using the MLR, as we expand the size of training examples, there is an increase in the MAE. Furthermore, we perceive a different behavior for the other two learning models. We see that the performance of the RF stabilizes when we increase the training data following 70 training examples up to 180. Moreover, we remark that the ANN exhibits a continuous reduction in the MAE when more training examples, up to 180, are attached to the training set.
The learning curves in Figure 4, reveal that each of the learning models is affected differently by the change in the training set size. We notice that the MLR outperforms the other two methods, for making a reliable baseline using the least amount of days. Furthermore, we see that the performance of the MLR worsens when we increase the number of training examples. This can be explained by the nonstationary nature of the datasets. This nonstationarity is a problem for the MLR since it has difficulties with nonlinear relationships. Because the MLR works well with a smaller number of training examples, we assume that the dataset contains periods of local stationarity. One study [5], shows that it is possible that nonstationary time series appear stationary when examined close up. In this local period, the statistical properties change slowly over time. As a consequence, the data that lies close to the forecast period is more likely to be predictive for this forecast period.
For the ANN and RF, stationarity is irrelevant since they are able to handle more complex, nonlinear relations. We see evidence for this in our results, there is a promising development over time in the associated learning curve. We believe that with more diverse data, the ANN could be able to predict a baseline with less number of training days than the MLR. Unfortunately, we were not able to investigate this further.
As shown in Figure 4, we are able to create a reliable model with the MLR trained on 30 days. Therefore, we trained the MLR for each of the stores during the same period of the year, March 2016, and we estimated the energy consumption for the period of one year, from April 2016, until February 2017.
Figure 5, shows the evolution of the MAE throughout this period. We observe that during the first 30 days of predictions, the MAE remains quite low, under 0.5. Next, we see that during the period between 50 till 180 days, the MAE is higher for all the stores. As a matter of fact, this period represents the months June, July, August, and September. Table 3 shows, that throughout these months, temperature levels reach higher values than in March, the period that was used for training the model. This explains why the MAE is higher. To avoid this problem, we could train a different model for each of the two energy profiles. Because our dataset is limited, we were not able to test this in practice.
We observe, in Figure 5, that in Aveiro the influence of seasonality is less evident than for the supermarkets in Fatima and Macedo Cavaleiros. Since all stores are trained and tested with the same model and in the same period of time, the most plausible factor, for this, are the variables that are related to Temperature. The average temperatures of the three stores follow a similar pattern, higher in the summer and lower in the winter. However, if we focus on the amplitudes of the average temperatures per month, (Table 3), we observe that Aveiro registered the smallest amplitude, with a difference of . The other stores, Fatima and Macedo Cavaleiros, noted an amplitude of and respectively. This seems to explain why the model trained for the store of Aveiro, is less affected by seasonality.
In Figure 5, we notice that after 220 days the accuracy of the model increases again. When we look at Table 3, we see that the temperature values from November on, are comparable to the ones in March. Nevertheless, the error is still higher than in the period of the first 30 days. We applied this method in different periods of time, and we perceived similar behavior.
In conclusion, we base our decision on the average prediction. Figure 5 shows that the average prediction remains stable until 30 days, therefore, we recommend updating the model up to 30 days.
Store  Jan  Feb  Mar  Apr  May  June  July  Aug  Sep  Oct  Nov  Dec 

Aveiro  12  13  14  16  17  20  21  21  19  18  14  14 
Fatima  9  11  11  14  15  19  22  22  20  17  12  10 
M. Cav.  8  10  12  15  16  22  26  25  22  16  10  8 
5.2 Estimated energy savings
Each store has a different number of observations, and they are also collected in different periods of time. We will train the MLR, RF, and ANN with the first 180 and 360 days of data, and test for the remaining days. We will do this for the stores located in Aveiro, Fatima, and Macedo Cavaleiros. Therefore, we train each store in different periods, and not within the same period.
In Figure 5, we noticed that 30 training days were not enough to make accurate long term predictions. Therefore, we decide to include more training days into our training set. Each of the following plots, in Figures 6, 7, 8, 9, 10, and 11, show how the prediction error evolves over time, per store, per model and number of training days. Each point shows the average error for 10 subsequent predictions.
Figures 6, 7, and 8 show the evolution of the prediction error when the models are trained on the first 180 days of data. We observe, that each store shows a similar behavior as shown in Figure 5. This is more evident when we compare the error of the MLR (red line) with the error in Figure 5. Overall, the MAE is lower for the stores of Fatima and Macedo Cavaleiros, if we use 180 days instead of 30 days. These results also show, that the effect of the different consumption modes is still visible, but less dramatically.
We expect that long term predictions become more accurate when we use 360 training days to train the model because the model is trained with data from all periods of the year. Because we use this number of training days, a bigger variation of temperature values is included in the training set. Therefore, we decided to train the models, for all stores, on the first 360 training days and study the predictions on the remaining days. Figures 9, 10, and 11 show us how the MAE error evolves for this period of time. We observe, that the for the corresponding period of time, the MAE is a bit lower than for the models trained on 180 days.
In contrast to Figure 4, the MLR has the worst performance, while the RF and ANN perform somewhat similar. The results of this experimental part supports the general idea that when we train the models with more data, our predictions will improve.
When the algorithms are trained with 180 training days, the effect of the different energy consumption modes is still visible. When we use 360 training days, we observe that the predictions become more accurate. Therefore, we advice to train algorithms on 360 training days to create long term predictions.
6 Estimate Energy Savings
The Retailer wants to estimate, with reasonable accuracy, the energy savings resulting from its energy policies. Changes in energy policies, such as the retrofitting an equipment, require high investments. This makes it important for the Retailer to know if the investments are truly effective, in the reduction of energy consumption. If we use a baseline trained with data before some measure is implemented, we can estimate the energy savings by comparing its estimates with the observed consumption.
We selected two stores that have undergone a retrofitting of the equipment. From these stores exactly one year of data is available. Mangualde and Regua had, respectively, 170 and 200 training days available before the Retrofit. Because we have less than a year of data available, we decide to use the MLR, trained on 30 days, which shows the best performance in Figure 4.
Figures 12 and 13 show the observed consumption (orange lines) versus the baseline estimates (blue lines) for these two stores. We trained the MLR for both stores, on 30 training days, between 50 and 20 days before the Retrofit and we predicted for 50 days. This makes it easier to visualize how the baseline compares with the energy consumption before and after the Retrofit.
The deviations, between the baseline and the energy consumption, can result from poor prediction performance or energy savings/losses. We chose a setup that gives us a reliable baseline, therefore, we believe that the deviations are caused by energy savings. In both Figures 12 and 13, we observe that, before the Retrofit, the baseline and the real energy consumption intertwine in several points. This behavior, which was also seen before, shows that the predictions are close to the real consumption. After the Retrofit, however, the observed consumption is always lower than the prediction, which offers strong evidence that the implemented measure was effective.
Hence, if we assume that the baseline is accurate enough, we can estimate the energy savings using the difference between the predicted and observed energy consumption.
7 Conclusions
Energy efficiency measures can require high investments. This makes it important for the Retailer to know if the investments are truly effective, in reducing energy consumption. Energy baselines can be used to study the effectiveness of energy efficiency measures. The results can simplify decisions to reserve funding for the required investments in other stores.
In this study, we researched if offtheshelf data science technologies can be used to create energy baselines that support improved energy management. Before that, we also performed some exploratory analysis to better understand the data.
Our first goal, was to determine the minimum amount of training days needed to create a reliable baseline, and which model performs best. For that, we studied the prediction accuracy of three machine learning models, ANN, RF, and MLR, based on various datasets. For the experiments, we proposed a sliding window approach in which we systematically expanded the size of the training set with historical data. Our experiments show, that the MLR has a clear advantage over the other two methods for creating a baseline with a minimum amount of days. This model needs 30 training days to estimate a reliable baseline.
The second goal was to determine how often the algorithm needs to be updated when trained with a MLR on 30 training days. We trained our algorithm multiple times, on all stores, and in different time periods. Our analysis shows that the MAE stays low for a period of 30 days, after this the MAE dramatically increases. Moreover, we observed that the energy consumption follows a different profile when average temperatures are higher than 20 degrees. These findings are in line with our insights derived from Subgroup Discovery. Our analysis shows, that the amplitude of the average temperature affects the prediction performance. Hence, we advise updating the model up to 30 days.
Our third goal, was to determine if we can estimate energy savings after implementing an energy efficiency measure. To answer this question, we trained our models with 180 and 360 training days and predicted for the remaining days. Our findings show, that the predictions become the most accurate when trained with 360 training days. Because we use 360 training days, a bigger variation of temperature values is included in the training set. This supports the general idea that when we train the models with more data, our predictions will improve. With a baseline, trained on 360 training days, the Retailer is able to estimate, with reasonable accuracy, the energy savings resulting from its energy policies. Moreover, he can compare the energy savings to the investment made for the measure. This has obvious advantages for the retailer.
In summary, the results of this study show that we have been able to create reliable energy baselines using offtheshelf data science technologies. Moreover, we found a way to create them based on short term historical data.
Acknowledgments
This work is financed by the ERDF – European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT – Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project 3GEnergy (AE20160286).
References
References
 [1] M. Atzmueller. Subgroup discovery. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 5(1):35–49, 2015.
 [2] C. Bergmeir and J. M. Benítez. On the use of crossvalidation for time series predictor evaluation. Information Sciences, 191:192 – 213, 2012. Data Mining for Software Trustworthiness.
 [3] G. Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13:1063–1095, 2012.
 [4] E. Busseti, I. Osb, and S. Wong. Deep learning for time series modeling, 2012.
 [5] A. Cardinali and G. Nason. Costationarity of locally stationary time series using costat. Journal of Statistical Software, Articles, 55(1):1–22, 2013.
 [6] A. Chari and S. Christodoulou. Building energy performance prediction using neural networks. 2017.
 [7] C. Chatfield. Timeseries forecasting. CRC Press, 2000.
 [8] J.S. Chou and N.T. Ngo. Time series analytics using sliding window metaheuristic optimizationbased machine learning system for identifying building energy consumption patterns. Applied Energy, 177:751 – 770, 2016.
 [9] H. B. Christensen. The lowenergy supermarket project. Heating/Piping/Air Conditioning Engineering, 75(12):48 – 51, 2003.

[10]
M. FernándezDelgado, E. Cernadas, S. Barro, and D. Amorim.
Do we need hundreds of classifiers to solve real world classification problems?
J. Mach. Learn. Res., 15(1):3133–3181, Jan. 2014.  [11] A. Foucquier, S. Robert, F. Suard, L. Stéphan, and A. Jay. State of the art in building modelling and energy performances prediction: A review. Renewable and Sustainable Energy Reviews, 23:272 – 288, 2013.
 [12] N. S. G. Mavromatidis, S.Acha. Diagnostic tools of energy performance for supermarkets using artificial neural network algorithms. Energy and Buildings, 62:304 – 314, 2013.
 [13] Garcia and Coelho. Energy efficiency strategies in refrigeration systems of large supermarkets. International Journal of Energy, Environment and Economics, 4(3):64–70, 2010.
 [14] K. Gillingham and K. Palmer. Bridging the energy efficiency gap: Policy insights from economic theory and empirical evidence. Review of Environmental Economics and Policy, 8(1):18, 2014.
 [15] J. D. Hamilton. Time series analysis, volume 2. Princeton university press Princeton, 1994.
 [16] F. Herrera, C. J. Carmona, P. Gonzalez, and M. J. del Jesus. An overview on subgroup discovery: foundations and applications. Knowledge and Information Systems, 29(3):495–525, Dec 2011.
 [17] N. R. Hoot, L. J. LeBlanc, I. Jones, S. R. Levin, C. Zhou, C. S. Gadd, and D. Aronsky. Forecasting emergency department crowding: A discrete event simulation. Annals of Emergency Medicine, 52(2):116 – 125, 2008.
 [18] D. Jacob, S. Dietz, S. Komhard, C. Neumann, and S. Herkel. Blackbox models for fault detection and performance monitoring of buildings. Journal of Building Performance Simulation, 3(1):53–62, 2010.
 [19] G. Jiang and W. Wang. Markov crossvalidation for time series model evaluations. Inf. Sci., 375:219–233, 2017.
 [20] S. Karatasou, M. Santamouris, and V. Geros. Modeling and predicting building’s energy use with artificial neural networks: Methods and results. Energy and Buildings, 38(8):949 – 958, 2006.
 [21] P. Langley. Machine learning as an experimental science. Machine Learning, 3(1):5–8, Aug 1988.
 [22] Z. Li, X. Ma, and H. Xin. Feature engineering of machinelearning chemisorption models for catalyst design. Catalysis Today, 280:232 – 238, 2017. A Decade of Effort in Addressing the Grand Challenges in Catalysis.

[23]
S. B. M. Braun, H. Altan.
Using regression analysis to predict the future energy consumption of a supermarket in the uk.
WIT Transactions on Ecology and the Environment, 176:3–13, 2013.  [24] P. S. M. Hrnčár. Performance monitoring strategies for effective running of commercial refrigeration systems. In Proceedings of the 12th WSEAS International Conference on Automatic Control, Modelling & Simulation, ACMOS’10, pages 177–180, Stevens Point, Wisconsin, USA, 2010. World Scientific and Engineering Academy and Society (WSEAS).
 [25] Z. Mylona, M. Kolokotroni, and T. S. A. Frozen food retail: Measuring and modelling energy use and space environmental systems in an operational supermarket. Energy and Buildings, 144:129 – 143, 2017.
 [26] G. E. Nasr, E. A. Badr, and M. R. Younes. Neural networks in forecasting electrical energy consumption. pages 489–492, 2001.
 [27] J. A. Orosa and A. C. Oliveira. A field study on building inertia and its effects on indoor thermal environment. Renewable Energy, 37(1):89 – 96, 2012.
 [28] C. Perlich. Learning Curves in Machine Learning, pages 577–580. Springer US, Boston, MA, 2010.
 [29] S.A. Tassou, Y. Ge, A. Hadawey, D. Marriott. Energy consumption and conservation in food retailing. Elsevier, (31), 2011.
 [30] M. Safa, M. Safa, J. Allen, A. Shahi, and C. T. Haas. Improving sustainable office building operation by using historical data and linear models to predict energy usage. Sustainable Cities and Society, 29:107 – 117, 2017.
 [31] M. Schulze, H. Nehler, M. Ottosson, and P. Thollander. Energy management in industry – a systematic review of previous findings and an integrative conceptual framework. Journal of Cleaner Production, 112:3692 – 3708, 2016.
 [32] L. Silva. A feature engineering approach to wind power forecasting. International Journal of Forecasting, 30(2):395 – 401, 2014.
 [33] L. Timma, R. Skudritis, and D. Blumberga. Benchmarking analysis of energy consumption in supermarkets. Energy Procedia, 95:435 – 438, 2016. International Scientific Conference “Environmental and Climate Technologies”, CONECT 2015.
 [34] J. N. van Rijn, S. M. Abdulrahman, P. Brazdil, and J. Vanschoren. Fast Algorithm Selection Using Learning Curves, pages 298–309. Springer International Publishing, Cham, 2015.
 [35] Z. Wang and R. S. Srinivasan. A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models. Renewable and Sustainable Energy Reviews, 75:796 – 808, 2017.
 [36] C. Willmott and K. Matsuura. Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Climate Research, 30:79 – 82, 2005.

[37]
C. J. Willmott and K. Matsuura.
On the use of dimensioned measures of error to evaluate the performance of spatial interpolators.
International Journal of Geographical Information Science, 20(1):89–102, 2006.  [38] E. Worrell, J. A. Laitner, M. Ruth, and H. Finman. Productivity benefits of industrial energy efficiency measures. Energy, 28(11):1081 – 1098, 2003.
Comments
There are no comments yet.