Currently, most of the energy produced worldwide uses coal or natural gas. However, much of this energy is wasted. In the United States of America, approximately 58% of energy produced is wasted Battaglia (2013). Furthermore, 40% of this wasted energy is due to industrial and residential buildings. By reducing energy wastage in the electric power industry, we reduce damage to the environment and reduce the dependence on fossil fuels.
Short-term load forecasting (STLF) (i.e., one hour to a few weeks) can assist since, by predicting load, one can do more precise planning, supply estimation and price determination. This leads to decreased operating costs, increased profits and a more reliable electricity supply for the customer. Over the past decades of research in STLF there have been numerous models proposed to solve this problem. These models have been classified into classical approaches like moving averagede Andrade & da Silva (2009) and regression models Hong et al. (2011)
, as well as machine learning based techniques, regression treesMori & Kosemura (2001)2006) and Artificial Neural Networks Lee et al. (1992).
In recent years, many deep learning methods have been shown to achieve state-of-the-art performance in various areas such as speech recognitionHinton et al. (2012)2012)2008). This promise has not been demonstrated in other areas of computer science due to a lack of thorough research. Deep learning methods are representation-learning methods with multiple levels of representation obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level LeCun et al. (2015). With the composition of enough such transformations, very complex functions can be learned.
In this paper, we compare deep learning and traditional methods when applied to our STLF problem and we also provide a comprehensive analysis of numerous deep learning models. We then show how these methods can be used to assist in the pricing of electricity which can lead to less energy wastage. To the best of our knowledge, there is little work in such comparisons for power usage in an electrical grid. The data we use is based on one year of smart meter data collected from residential customers. We apply each of the deep and traditional algorithms to the collected data while also noting the corresponding computational runtimes. Due to differences in electricity usage between the week and the weekend, we then split the data into two new datasets: weekends and weekly data. The algorithms are applied to these new datasets and the results are analyzed. The results show that the deep architectures are superior to the traditional methods by having the lowest error rate, but they do have the longest run-time. Due to space limitations we do not provide details of the traditional approaches but do provide references.
2.1 Data Description
Our dataset consists of 8592 samples of 18 features that were collected from several households. The dataset was broken into 3 parts for training, validation and testing of sizes 65%, 15%, 20% respectively. The readings were recorded at hourly intervals throughout the year. Some of the features were electrical load readings for the previous hour, the previous two hours, the previous three hours, the previous day same hour, the previous day previous hour, the previous day previous two hours, the previous 2 days same hour, the previous 2 days previous hour, the previous 2 days previous two hours, the previous week same hour, the average of the past 24 hours and the average of the past 7 days. The rest of the features (which do not contain electrical load readings) are the day of the week, hour of the day, if it is a weekend, if it is a holiday, temperature and humidity. These features were selected as they are typically used for STLF. In addition, the total electrical load does not change significantly throughout the year since the households are located in a tropical country where the temperature remains fairly constant throughout the year.
2.2 Comparison Method
As a preprocessing step, the data is cleaned and scaled to zero mean and unit variance. All traditional methods use cross-validation to determine appropriate values for the hyper-parameters. A random grid search was used to determine the hyper-parameters for the deep learning methods.
Several baseline algorithms were chosen. They include the Weighted Moving Average (WMA) where with and
, Multiple Linear Regression (MLR) and quadratic regression (MQR), Regression Tree (RT) with the minimum number of branch nodes being 8, Support Vector Regression (SVR) with a linear kernel and Multilayer Perception (MLP), with the number of hidden neurons being 100.
For our Deep Neural Network methods we used Deep Neural Network without pretraining (DNN-W), DNN with pretraining using Stacked Autoencoders (DNN-SA)Shin et al. (2011)
, Recurrent Neural Networks (RNN)Hermans & Schrauwen (2013)
, RNNs and Long Short Term Memory (RRN-LSTM)Gers et al. (2001)
, Convolutional Neural Networks (CNN)Siripurapu (2015) and CNNs and Long Short Term Memory (CNN-LSTM)] Sainath et al. (2015)
To evaluate the goodness of fit of these algorithms we use the Mean Absolute Percentage Error (MAPE) defined as:
where is the number of data points, is the particular time step, is the target or actual value and is the predicted value.
In order to determine the cost of the prediction errors (i.e. whether the prediction is above or below the actual value) the Mean Percentage Error (MPE) is used, which is defined as:
2.3 Numerical Results
|Algorithm||200 Epocs||400 Epocs|
We first look at the baseline methods, (with the exception of MLP) in Table 1. From the table we see that MLR performs the worst, with a MAPE of 24.25%, which would indicate that the problem is not linear (see Figure 1). However, the RT algorithm outperforms the rest of the methods by a noticeable margin. This shows that the problem can be split into some discrete segments which would accurately forecast the load. This can be confirmed by looking at the load in Figure 1 where it is clear that, depending on the time of day, there is significant overlap of the value of the load between days. Thus, having a node in the RT determining the time of the day would significantly improve accuracy. The run-time for these algorithms was quite short with WMA taking the longest due to the cross-validation step where we determined all possible coefficients in steps of 0.05.
Due to the typically long running time of DNN architectures, the algorithms were restricted to 200 and 400 epocs. From Table 2, there is a clear difference when looking at the 200 epocs and the 400 epocs MAPE columns, as most of the algorithms have a lower MAPE after running for 400 epocs when compared with 200 epocs. This is especially true for the which saw significant drops in the MAPE. The MLP did not perform the worst in both epocs but it was always in the lower half of accuracy. This indicates that the shallow network might not be finding the patterns or structure of the data as quickly as the DNN architectures. However, it outperformed RT in both the 200 and 400 epocs. This alludes to the fact that the hidden layer is helping to capture some of the underlying dynamics that a RT cannot.
Looking at the 200 epocs column, we see that performs the best with a MAPE of 2.64%. On the other hand, the most stable architecture is the DNN-SA with a MAPE consistently less than 3%. This robustness is shown when the epocs are increased to 400 where the DNN-SA architecture outperforms all the other methods (both the baseline and deep methods). The pretraining certainly gave these methods a boost over the other methods as it guides the learning towards basins of attraction of minima that support better generalization from the training data set Erhan et al. (2010). RNNs, and to an extent LSTM, have an internal state which gives it the ability to exhibit dynamic temporal behavior. However, they require a much longer time to compute which is evident in Table 2 since these methods had trouble mapping those underlying dynamics of the data in such a small number of epocs. CNNs do not maintain internal state, however with load forecasting data, one can expect a fair amount of auto-correlation that requires memory. This could explain their somewhat low but unstable MAPE for 200 and 400 epocs.
Taking both tables into consideration, most of the DNN architectures vastly outperform the traditional approaches, but DNNs require significantly more time to run and thus there is a trade-off. For STLF, which is a very dynamic environment, one cannot wait for a new model to complete its training stage. Hence, this is another reason we limited the number of epocs to 200 and 400. Table 2 shows that limiting the epocs did not adversely affect many of the DNN architectures as most were able to surpass the accuracy of the traditional methods (some by a lot). When selecting a model, one would have to determine if the length of time to run the model is worth the trade-off between accuracy and runtime.
2.4 Daily Analysis
We know that people have different electrical usage patterns on weekdays when compared to weekends. This difference can be seen in Figure 1 which illustrates usage for a sample home. This household uses more energy during the weekdays than on weekends. There are electrical profiles that may be opposite, i.e., where the weekend electrical load is more. Whatever the scenario, there are usually different profiles for weekdays and weekends.
To see how our models handle weekdays and weekends, we calculated the average MAPE for each day of the week in the test set (the 400 epoc models was used for the DNNs calculations). The average for each day of the week is tabulated in Table 3. From the table, it is clear that most of the DNN algorithms have their lowest MAPE during the week. This is indicative that the patterns for weekdays are similar and as a result have more data. By having more data, DNNs are better able to capture the underlying structure of the data and thus are able to predict the electrical load with greater accuracy. Weekend predictions have a higher MAPE since DNNs require a lot of data to perform accurate predictions and for weekends this data is limited. The WMA and MQR seem to have their best day on Sunday, but have a very poor MAPE for the rest of the days. This indicates that the models have an internal bias towards Sunday and as a result fail to accurately predict the values for other days. It is clear, again, that DNNs outperform the traditional methods.
2.5 Mean Percentage Error
In this particular domain, an electricity provider will also be interested in changes of electrical load, as opposed to absolute error, in order to adjust generation accordingly, mostly because starting up additional plants takes time. This is why the Mean Percentage Error (MPE) was used. The MPE would tell that a model with a positive value ”under-predicts” the load while a negative value ”over-predicts” the actual value and they can then adjust their operations accordingly.
Many of the traditional methods had predicted more electrical load than the actual load, including MLP. However, most of the DNNs have under-predicted the load value. Looking at the best in Table 2, DNN-SAs MPE values (for 400 epocs), they are all under 1% and positive, which indicates that it under-predicts the value. However, one should not use the MPE alone. An example is RNNs which have a low positive MPE, however it’s MAPE in both epocs is around 5%. This indicates that RNN had a slightly larger sum of values that ”under-predicts” than ”over-predicts”, but its overall accuracy is not as good as other deep architectures.
2.6 Applications to Energy Efficiency
Using the results from STLF (MAPE and MPE), a company can now accurately predict upcoming load. This would mean that a power generating company can now produce energy at a much more precise amount rather than producing excess energy that would be wasted. Since most of these companies use fossil fuels which are non-renewable sources of energy, we would be conserving them as well as reducing levels of carbon dioxide released into the atmosphere and the toxic byproducts of fossil fuels.
Another benefit of accurate load forecasting is that of dynamic pricing. Many residential customers pay a fixed rate per kilowatt. Dynamic pricing is an approach that allows the cost of electricity to be based on how expensive this electricity is to produce at a given time. The production cost is based on many factors, which in this paper, is characterized by the algorithms for STLF. By having a precise forecast of electrical load, companies now have the ability to determine trends, especially at peak times.
An example of this would be in the summer months when many people may want to turn on their air conditioners and thus electricity now becomes expensive to produce as the company could have to start up additional power generating plants to account for this load. If the algorithms predict that there would be this increase in electrical load around the summer months, this would be reflected in the higher price that consumers would need to pay. As a result, most people would not want to keep their air conditioner on all the time (as per usual) but use it only when necessary. Taking this example and adding on washing machines, lights and other appliances, we can see the immense decrease in energy that can be achieved on the consumer side.
3 Related Work
The area of short-term load forecasting (STLF) has been studied for many decades but deep learning has only recently seen a surge of research into its applications. Significant research has been focused on Recurrent Neural Networks (RNNs). In the thesis by Mishra (2008)
, RNNs was used to compare other methods for STLF. These methods included modifications of MLP by training with algorithms like Particle Swarm Optimization, Genetic Algorithms and Artificial Immune Systems. Two other notable papers that attempt to apply DNN for STLF areBusseti et al. (2012) and Connor et al. (1992). In Busseti et al. (2012), they compare Deep Feedfoward Neural Networks, RNNs and kernelized regression. In the paper by Connor et al. (1992) a RNN is used for forecasting loads and the result is compared to a Feedfoward Neural Network. However, a thorough comparison of various DNN architectures is lacking and any applications to dynamic pricing or energy efficiency is absent.
In this paper, we focused on energy wastage in the electrical grid. To achieve this, we first needed to have an accurate algorithm for STLF. With the advent of many deep learning algorithms, we compared the accuracy of a number of deep learning methods and traditional methods. The results indicate that most DNN architectures achieve greater accuracy than traditional methods even when the data is split into weekdays and weekends. However such algorithms have longer runtimes. We also discussed how these algorithms can have a significant impact in conserving energy at both the producer and consumer levels.
- Battaglia (2013) Battaglia, Sarah. Us now leads in energy waste, 2013. URL http://www.theenergycollective.com/sbattaglia/193441/us-most-energy-waste.
- Busseti et al. (2012) Busseti, Enzo, Osband, Ian, and Wong, Scott. Deep learning for time series modeling. Technical report, Stanford, 2012.
- Collobert & Weston (2008) Collobert, Ronan and Weston, Jason. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, 2008.
- Connor et al. (1992) Connor, Jerome, Atlas, Les E., and Martin, Douglas R. Recurrent networks and narma modeling. In Advances in Neural Information Processing Systems 4. 1992.
- de Andrade & da Silva (2009) de Andrade, L.C.M. and da Silva, I.N. Very short-term load forecasting based on arima model and intelligent systems. In Intelligent System Applications to Power Systems, 2009. ISAP ’09. 15th International Conference on, 2009.
- Erhan et al. (2010) Erhan, Dumitru, Bengio, Yoshua, Courville, Aaron, Manzagol, Pierre-Antoine, Vincent, Pascal, and Bengio, Samy. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11, March 2010.
- Gers et al. (2001) Gers, Felix A., Eck, Douglas, and Schmidhuber, Jürgen. Artificial Neural Networks — ICANN 2001: International Conference Vienna, Austria, August 21–25, 2001 Proceedings, chapter Applying LSTM to Time Series Predictable through Time-Window Approaches. 2001.
- Hermans & Schrauwen (2013) Hermans, Michiel and Schrauwen, Benjamin. Training and analysing deep recurrent neural networks. In Advances in Neural Information Processing Systems 26, pp. 190–198. 2013.
- Hinton et al. (2012) Hinton, Geoffrey, Deng, Li, Yu, Dong, rahman Mohamed, Abdel, Jaitly, Navdeep, Senior, Andrew, Vanhoucke, Vincent, Nguyen, Patrick, Dahl, Tara Sainath George, and Kingsbury, Brian. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2012.
- Hong et al. (2011) Hong, Tao, Wang, Pu, and Willis, H.L. A naïve multiple linear regression benchmark for short term load forecasting. In Power and Energy Society General Meeting, 2011 IEEE, 2011.
- Krizhevsky et al. (2012) Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pp. 1097–1105. 2012.
- LeCun et al. (2015) LeCun, Yann, Bengio, Yoshua, and Hinton, Geoffrey. Deep learning. Nature, 2015.
- Lee et al. (1992) Lee, K.Y., Cha, Y.T., and Park, J.H. Short-term load forecasting using an artificial neural network. Power Systems, IEEE Transactions on, 1992.
- Mishra (2008) Mishra, Sanjob. Long short-term memory in recurrent neural networks. Master’s thesis, National Institute Of Technology Rourkela, 2008.
- Mori & Kosemura (2001) Mori, H. and Kosemura, N. Optimal regression tree based rule discovery for short-term load forecasting. In Power Engineering Society Winter Meeting, 2001. IEEE, 2001.
- Niu et al. (2006) Niu, Dong-Xiao, Wang, Qiang, and Li, Jin-Chao. Advances in Machine Learning and Cybernetics: 4th International Conference, chapter Short Term Load Forecasting Model Based on Support Vector Machine. 2006.
- Sainath et al. (2015) Sainath, T. N., Vinyals, O., Senior, A., and Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In 40th IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.
- Shin et al. (2011) Shin, Hoo-Chang, Orton, M., Collins, D.J., Doran, S., and Leach, M.O. Autoencoder in time-series analysis for unsupervised tissues characterisation in a large unlabelled medical image dataset. In Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, 2011.
- Siripurapu (2015) Siripurapu, Ashwin. Convolutional networks for stock trading. Technical report, Stanford University, 2015.