I Introduction
The modern smart grid is an enhanced electrical grid that takes advantage of sensing and information communication technologies to improve the efficiency, reliability, and security of traditional power grid. Compared to the traditional power grid, entities in smart grids are able to obtain timely power grid status of many kinds. Smart metering, which is a major improvement brought by smart grids, facilitates realtime metering and reporting of electricity consumption data. One resulting benefit is that the accurate, finegrained power demand forecasting can be carried out based on such meter measurement, which affects the power generation scheduling and power dispatching for a future period by predicting the power demand in that period using the historical data in hand.
Demand forecasting is important in demand management for both power companies and electricity customers [1]. For power companies, based on the demand forecasting results, they can allocate proper resources to balance the supply and demand, or adjust the demand response strategy such as dynamic pricing to shape the load so as to avoid the infrastructure capacity strain or to avoid additional cost for starting peaker plants. In addition, they can detect the abnormal meter measurements caused either by the unexpected meter failure or the deliberate meter manipulation by identifying those measurements that do not present a conformance to the predicted/expected values. For the electricity customers, power forecasting provides them with their expected power consumption and cost in a future period under dynamic pricing strategy, so that they can adjust their usage schedule accordingly to achieve a lower cost.
Although demand forecasting has been widely studied for years, a challenge in making accurate forecasting is that the power demand is subject to various influential factors which may have discriminative capability in influencing the power demand. With this challenge in mind, we propose a novel forecasting neural network architecture named PowerNet
. We take into account a set of features from three heterogeneous dimensions, i.e., the historical consumption data, the weather information, and the calendar information, all of which are considered influential on electricity customers’ power consumption patterns. In each dimension, a set of features is developed. Then, we introduce our model, PowerNet, which is capable of incorporating all the designed features. The key property of PowerNet is the ability to model both sequential data (i.e., historical consumption data) and nonsequential data (i.e., weather & calendar information) in a unified manner. The underpinning idea lies in the use of recurrent neural network for encoding dependencies implied in sequential data and multilayer perceptron network for capturing correlations between nonsequential features and predictions. In order to evaluate the effectiveness of our model, we compare PowerNet with two stateoftheart demand forecasting techniques based on Gradient Boosting Tree (GBT)
[2] and Support Vector Regression (SVR) [3], respectively. Moreover, we then tackle two crucial questions that need to be answered when operating PowerNet in practice: how far in the future the model can forecast with a decent accuracy and how often we should retrain the forecasting model to retain its modeling capability. Last but not the least, we discuss a multilayer datadriven anomaly detection approach based on PowerNet.The contributions of this work are summarized below.

We propose PowerNet, a novel power demand forecasting neural network that captures heterogeneous features in a unified way.

We compare PowerNet with two representative models adopted in recent research works, i.e., GBT and SVR. The results reveal that PowerNet reduces the Mean Square Error (MSE) by 33.3% and 14.3% compared to GBT and SVR, respectively.

We further evaluate the forecasting model under different forecasting duration and retraining frequency, using publicly available datasets. Our findings include:

PowerNet can serve the dayahead forecasting tasks well. The Mean Absolute Percentage Error (MAPE) of the 24hour forecasting grows over time but is capped at 10%.

The effectiveness of our model after one training process can last 550 hours with MAPE around 11% and 36 hours with MAPE less than 10%.

The rest of this paper is organized as follows. In Section II, we discuss features to be incorporated into power demand forecasting. Section III elaborates the design of PowerNet. We discuss evaluation results, including comparison with stateoftheart techniques and empirical results to answer the aforementioned questions for practical operation in Section IV, followed by a brief discussion about the application for anomaly detection in Section V. Related work is discussed in Section VI, and we conclude the paper in Section VII.
Ii Feature Design and Dataset
Power consumption patterns are affected by a variety of factors. Thus demand forecasting mechanism should incorporate such factors as features, in addition to historical energy consumption data. We focus on weather and calendar data. Below, we elaborate these features and dataset we utilize in this paper.
Iia Energy Usage Dataset
We use the publicly available dataset provided by University of Massachusetts [4].It includes two parts, the apartment dataset and the weather dataset.
The apartment dataset contains data for 114 singlefamily apartments located in Western Massachusetts for the period from the year 2014 to 2016. The dataset records the power of every single apartment in fixed temporal frequency^{1}^{1}1Given the metering interval is fixed, power values are able to represent the power consumption.. The metering frequency is once every 15 minutes for the year 2014 and 2015 (before December 15), and once every 1 minute for the year 2016. The data is in .csv files, each of which records the power consumption details for one apartment within one year with apartment ID as its file name.
Along with the power consumption data, hourly weather information during the record period from 2014 to 2016 is available. Fourteen meteorological attributes are included in the weather dataset including weather summary, temperature, humidity, cloud cover, wind speed, wind bearing, visibility, pressure, etc. In our experiment, we use the data of the latest year 2016 because of its finer granularity in recording frequency as well as the latest consumption pattern it may reflect.
IiB Feature Design
The features used by the existing forecasting models fall into three categories in terms of privacy issue, i.e., publicly available information (e.g., weather information and calendar information), household private information (e.g., demography), and quasiprivate information (e.g., historical consumption data acquired by power utility companies). The quasiprivate information here is defined as privacyrelated but not public available data. For example, the historical consumption data can be used to infer certain private household characteristics [5], but it is only available to the authorized personnel within power utility companies instead of to the public.
Though it is natural for private household data to have a direct influence on the household power demand, e.g., more people living in the house leads to more power demand, in this work, we limit the predictors to nonprivate information due to the following reasons. First of all, we would like to involve no householdspecific data in forecasting procedure other than power meter readings due to user privacy concern. Secondly, some utility companies may have access to household private data such as locations. However, it is not common for utility companies to have other private information, for example, the demography information. Thirdly, the forecasting model not based on the house specific data can be applied to larger scales easily, such as building level or area level.
We develop three categories of features from the dataset, i.e., historical consumption data, weather information, and calendar information. Historical consumption data is the actual observation of the prediction target, which directly reflects the consumption pattern. Power utility companies can get this data by reading power meters. Weather information has an influence on the power demand since some appliances are sensitive towards weather conditions. For example, the use of air conditioner depends on the temperature and humidity. Calendar information, such as weekday or weekend, shapes the user consumption behavior in terms of different living/working styles. It indicates the consumption pattern according to the calendar feature and cycle.
Our features based on the above three categories are summarized in Table I. There are features in total, among which, features are from historical consumption data, 13 are from weather information, and 5 are designed from calendar information. The historical data involves a large number of data points. Therefore, it is necessary to find out historical data points that are most correlated with the target forecasting value. To solve this problem, we use AutoCorrelation Function (ACF), which can quantify the correlation between data points of various time lags, to find out the most related number of lag values .
Category  Detail  

Historical Consumption Data  Consumption data in past time slots  
Weather Information 


Calendar Information 

Iii PowerNet
Iiia Overview
Our approach is to forecast power demand by modeling the relationship between power demand and relevant features. Fig. 1 illustrates the highlevel pipeline of our approach, including feature design discussed in Section II. We propose a unified neural network model, named PowerNet, to jointly exploit the three categories of features developed in the previous section. Figure 2
shows the architecture of PowerNet. It has two main components. The left component (in blue) is designed to model the historical consumption time series data. The key is to capture the temporal effects of power consumption in that future consumption could be correlated to consumption in the recent past. Here, we utilize the Long ShortTerm Memory Network (LSTM)
[6] to encode the correlations between consecutive power consumption in time. The right component (in orange) is a Multilayer Perceptron model (MLP) [7] that is capable of modeling the nonlinearity in the weather and calendar data. Finally, we aggregate the outputs from these two components and make ultimate predictions of power demand through a Prediction Layer. In the following, we dissect each component of PowerNet.IiiB Input Layer
To incorporate sequential data and nonsequential data, the input layer of PowerNet consists of two parts: one for the former and the other for the latter. The first part of input is a series of historical power consumption data where each entry is a realvalued nonnegative power meter reading at time . The second part of input is the feature vectors of weather and calendar data, denoted by and , where and equal to the numbers of weather and calendar features introduced.
IiiC Power Consumption Encoding Layer
The utility of this layer is to encode the power consumption time series data based on LSTM network, which is a variation of recurrent neural network that can learn longterm dependencies. Different from traditional neural networks that can only take previous history readings as input, LSTM allows unlimited history information to persist with an internal loop mechanism while avoids the gradient vanishing problem [8]. Therefore, it has been successfully applied to various areas, e.g., continual prediction [9], language modeling [10], and translation [11]. The core of LSTM is a memory cell that can maintain information across time via gating mechanism. The LSTM cell maintains a cell status based on both current input and previous output (i.e., the recurrent input), and then decides what information to be left and what to be passed on (i.e., ). We do not detail the gating mechanisms here which can be found in previous literature [6]. We use LSTM() to represent the cell function.
In PowerNet, we apply a stacked LSTM to every time step of the power consumption time series data ,
(1) 
Finally, the output of at the last time step is used as a ultimate encoding of the entire power consumption series, where is the LSTM memory size.
IiiD Weather & Calendar Fusion Layer
In this layer, we handle input from the weather & calendar features. Specifically, we jointly model the two feature vectors through a multilayer perceptron network (MLP),
(2) 
where are trainable weights, , are the sizes of hidden units, denotes vector concatenation by column, and is the output encoding of this MLP. ReLU [12]
is used as the activation function for introducing nonlinearity.
IiiE Aggregation & Prediction Layer
Having both power consumption history and weather & calendar information encoded, we aggregate the obtained encodings and make the final predictions. Concretely, we concatenate the two encodings and and feed the result through a final feedforward regression network,
(3) 
where are trainable parameters and is the hidden size of the inner layer. Note that both and of the outer layer have only one hidden unit for producing the final predicted reading value. is the predicted power consumption reading value.
IiiF Optimization
For model training, we use mean squared error loss (Eq. (4)) with dropout regularization [13],
(4) 
where is the number of training examples,
are all the aforementioned trainable parameters in our model. In addition, all trainable parameters in the fullyconnected layers are regularized by L2 norm. Finally, adam (Adaptive Moment Estimation)
[14]is used as the optimizer for stochastic gradient descent.
Iv Evaluation
This section first compares PowerNet with two representative models used in recent works [2][3] in terms of two quantitative metrics. Then, we evaluate PowerNet under different settings, including the forecasting frequencies, forecasting periods, and the freshness of PowerNet.
Iva Preparation
IvA1 Baseline
We select two recent works as our baseline models in this work. Technically, one of them adopts GBT [2] and the other one adopts SVR [3]. For a fair comparison, we implement their models as well and apply the implemented models to the same public dataset as described in Section IIA.
GBT is adopted by Bansal et al. [2]
to forecast power consumption. GBT is a supervised learning predictive model which can be used for classification and regression purpose
[15][16]. GBT builds the model, i.e., a series of trees, in a stepwise manner. In each step, it adds one tree, and maintains the existing trees unchanged. The added tree is the optimal tree by minimizing a predefined loss function. Basically, GBT is an ensemble of weaker prediction models, which becomes a better model, which is exactly the core idea of the gradient.
SVM is used in the work by Yu et al. [3]
to forecast power usage. SVM is a supervised machine learning algorithm for solving both classification and regression problems
[17]. SVM does classification by seeking the hyperplane that differentiates the two classes to the largest extent, i.e., maximizing the margin. Similarly, regression using SVM is called SVR
[18] is to seek and optimize the generation bounds by minimizing the predefined error function. The regression can be conducted in both linear and nonlinear manner. For the nonlinear SVR, it needs to transform the data into a higher dimensional space so that it is possible to perform the linear separation.IvA2 Evaluation Metric
We introduce two metrics to evaluate the accuracy of the forecasting model, i.e., Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE). The smaller the error is, the more accurately the model predicts.
MSE measures the average of the square errors/deviations as directed by Equation 5. is the total number of forecasting values, denotes the actual value at time , and denotes the forecasting value at time . The closer the value to zero, the better the prediction is.
(5) 
Different from MSE, MAPE measures the error proportion to the absolute value. It expresses the error as a percentage and can be calculated using Equation 6.
(6) 
MSE is more useful in comparison experiments with identical test data, as it is the absolute square error value which depends on the scale of actual values. Comparing to MSE, MAPE is more indicative in the comparison between different data since it represents the error in a percentage manner.
IvB Comparison with Baselines
In this experiment, we compare our model with two recent works, i.e., the works of Bansal et al. [2] and Yu et al. [3] under the identical setting, with the same training and testing data. The two works [2] [3] are referred to as “GBT” and “SVR” for short in this section, respectively. Our PowerNet uses a twolayered LSTM network. The cell memory size for every layer is tuned from the set {64, 128, 256, 512} using grid search. Early stopping is employed when there is no further improvement on the validation set. Similarly, the parameters for baseline models are also automatically tuned in the same way. For GBT, three parameters are involved, i.e., the number of boosting stages to perform , maximum depth of the individual regression estimators , and learning rate . Its parameter grid is constructed using : {50, 100, 150, 200, 250, 300, 350, 400, 450, 500}, : {1, 2, 3, 4, 5}, and : {0.001, 0.01, 0.1, 1}. For SVR, three parameters , and are involved. We construct the parameter grid using : {0.001, 0.01, 0.1, 1}, : {rbf, linear, poly, sigmoid}, and hence is automatically set to the corresponding kernel coefficient or the reciprocal of the number of features.
We use the power consumption data of past 26 days, i.e., 624 hours as the training set to train the three models, and the next 48 hours data, i.e., day 2728 as the validation set. Finally, we make predictions on the test data of day 2930. Due to the space limit, we only demonstrate the results of our model and the two baselines on the data of a randomly chosen apartment (No. 69 in April). In particular, the results are obtained by training on the data from 1st April to 26th April (validating on data of 27th and 28th April) and testing on the data of 29th and 30th April. As Fig. 3 shown, our model is able to capture the trend as well as peaks and valleys better than both two other works do. PowerNet brings a decrease in both MSE and MAPE as shown in Table II. It decreases 33.3% and 14.3% in MSE compared to GBT [2] and SVR [3], respectively.
IvC Forecasting Period of PowerNet
In general, the accuracy of power demand forecasting deteriorates as a forecasting period becomes longer. thus, it is crucial for grid operators to know how much time ahead PowerNet can predict the demand without facing significant accuracy drop. In this section, we provide empirical results on forecasting accuracy against different forecasting periods using the realworld electricity consumption data. By doing so, grid operators can evaluate whether PowerNet is applicable for certain tasks that require different lengths of prediction period, such as the bidding in the dayahead electricity market and dayahead electricity scheduling which require the forecasting results one day ahead [19].
Some features for predicting the power demand in the far future may not available at the time of prediction. For example, the power consumption of the previous one hour is an important feature to predict the power demand for the next hour. If we predict more than one hour at once, we cannot know the actual consumption value for every “previous” hour, since it is not known yet. Therefore, the prediction in the far future relies on the predicted values previous to that. It means that there is a risk of error accumulation.
In this experiment, we predict the power demand for the future 30 days at once based on current historical data. We train the model on the aggregated historical data in June and predict the power demand for the following 30 days. The forecasting results are shown in Fig. 4 in red. We can see that the red line follows the original peaks and valleys well at the beginning. However, starting from some point around 550 on the xaxis, the red line totally loses track of the original values. In order to understand the error quantitatively, we plot MAPE in Fig. 5 in red. We can see from the MAPE plot that the error increases as it goes further into the future. Specifically, before 24 on the xaxis, the MAPE is at a low level less than 10%. Then, MAPE rises a regional peak 18% at 52 on the xaxis. After that, MAPE declines a bit to 16% and maintains the value till 550 on the xaxis from which the error increases sharply. Given the experimental results, the model is suitable for forecasting in the dayahead bidding task and dayahead electricity scheduling.
IvD Model Retraining Interval
For any datadriven model, it is necessary to keep the model up to date by retraining the model using fresh data. In particular, power consumption patterns are not stationary, and the trained model would become obsolete over time, which would result in lower forecasting accuracy. Thus, the timing for retraining is a crucial tuning parameter in realworld operation. Retraining usually happens when degrading in prediction is noticed. This subsection is to empirically investigate appropriate model retraining interval how long a trained PowerNet model can be used with acceptable accuracy. It also provides us with insight on how often PowerNet should be retrained to capture the new power demand characteristics evolved with time.
This experiment is different from the previous experiment in Section IVC. The experiment in Section IVC focuses on exploring the accuracy fluctuation caused by different lengths of forecasting periods, and it forecasts the power demand for a period at once based on the data on hand at that moment. Differently, this experiment uses actual data, which eliminates the error accumulation caused by forecasting using estimated feature values. We use the model trained in Section IVC, and test it using the actual data in July.
The results are shown in Fig. 4 using the blue line. Generally, the prediction based on actual values (the blue line) is better than the prediction based on predicted values (the red line), which is reasonable and as expected. From the MAPE plot which is the blue line in Fig. 5, the same conclusion can be drawn. We can see the error increases at the beginning which aligns with the red line before 15 on the xaxis, and it keeps increasing to 10% at 36 on the xaxis. Then, the error maintains around 11% till 550. At the very end, it reaches the largest error 13%. In practice, depending on the error tolerance of the prediction task, we can adjust our model by retraining the model with new data. For example, we can retrain the model every 36 hours to capture the new characteristics of the data generated during the 36 hours. Generally, the model can maintain an MAPE around 11% for more than 3 weeks.
V PowerNet for Anomaly Detection
Anomaly detection is to identify patterns in data that do not conform to the defined normal behavior [20]. Anomaly detection in smart grids focuses on the nontechnical loss which is not caused by the intrinsic loss (technical loss, e.g., transmission loss) in a power system. Electricity theft is one of the most focused nontechnical loss that causes anomalies. Datadriven anomaly detection can be done by modeling the normal consumption behavior and defining a normal region. Any consumption does not fall within the normal region is considered as an anomaly and potentially indicating a problem in the smart grid. The forecasting results from PowerNet can be interpreted differently depending on the tasks, e.g., the power demand at some time in the future or the expected normal consumption at that time. In the latter sense, PowerNet can be used to define the normal consumption behavior based on which further anomaly detection can be carried out.
Normally, for a consumer , the reported consumption should be equal to the actual consumption . However, an attacker may be able to manipulate aiming at reducing the bill by making . We conduct a preliminary experiment to understand the performance of PowerNet when electricity theft happens. We artificially reduce the power consumption by different theft percentages in the test data to simulate different electricity theft scenarios. Fig. 7 shows the forecasting MAPE results under different theft percentages and Fig. 7 magnifies the the first 30% of the xaxis in Fig. 7. We can see from the magnified view (Fig. 7) that when theft percentage is small, the MAPE grows linearly as the percentage of theft grows. However, from the experimental results in Fig. 7, we can see that the overall MAPE increases in an exponential manner. It means that the more the user steals, the larger the deviation between the predicted value and the reported value is. In addition, the more the user steals, the more obvious the deviation is. A reasonable threshold that would trigger an alarm can be inferred from the historical statistic data as well as the tolerance of theft.
Anomaly detection can be deployed in both substation layer and individual consumer layer. We discuss how PowerNet can be utilized to detect such anomalies in both layers.
Anomaly detection in substation layer. On the substation level, there is a master meter which is a meter to measure the overall consumption of the whole supply region. The reading of master meter is denoted as . So we have , where is the number of consumers in the supply region and is the technical loss. The substation can observe which is the reported consumption of consumer . We can obtain through . In normal case where , we have . In order to detect the anomaly where , we use PowerNet to model the indirectly observed . In the attack case where , a deviation would be observed between the predicted and the observed . Hence, PowerNet is able to detect the anomaly under a substation supply region by constructing one model for one substation.
Anomaly detection in individual consumer layer. Anomaly detection on substation level can detect the anomaly but cannot determine which consumer is suspicious. On the individual consumer level, with the help of the PowerNet, we can build a model for the consumer based on her historical . Once the attacker reduces her to make , we shall notice that there is a deviation between her and which is predicted by PowerNet. In this sense, anomaly detection in individual consumer layer can work as a complementary to anomaly detection in substation layer, which is able to locate the consumer who is suspiciously reporting false readings.
Vi Related Work
The existing works on power demand forecasting can be generally classified into two categories, i.e., classic statistical models and modern machine learning algorithms.
In terms of statistical models, timeseries models have been used to capture the timeseries characteristics of power demand, e.g., ARMA [21][22], ARIMA [23][24]. Beside timeseries models, Hong et al. [25]
adopt multiple linear regression to model hourly energy demand using seasonality (regarding year, week, and day) and temperature information. Their results indicate that complex featuring of the same information results in a more accurate forecasting. Fan and Hyndman
[26]use the semiparametric additive model to explore the nonlinear relationship between energy usage data and variables, i.e., calendar variables, consumption observations, and temperatures, in the shortterm time period. Their model demonstrates sensitivity towards the temperature. In addition, conditional kernel density estimation is applied to the power demand forecasting area which performs well on the data with strong seasonality
[27]. However, these models are limited in incorporating heterogeneous features in a unified way. Differently, the design of PowerNet makes it such a neural network that it is able to encode sequential features and singlevalue features simultaneously.Regarding the machinelearning models, there are three models widely used for demand forecasting tasks, namely Decision Tree (DT)
[2][28][29], Support Vector Machine (SVM)
[3][30][31][32], and Artificial Neural Network (ANN) [33][34][35]. DT is used to predict building energy demand levels [29] and analyze the electricity load level based on hourly observations of the electricity load and weather [28]. Later, Bansal et al. [2]use the boosted DT to model and forecast energy consumption so as to create personalized electricity plans for residential consumers based on usage history. There are also works using SVR, the regression based on SVM, to forecast power consumption in combination with other techniques, such as fuzzyrough feature selection
[32], particle swarm optimization algorithms
[31], and chaotic artificial bee colony algorithm [30]. The SVRbased prediction has demonstrated good prediction results [3]. For the third model ANN, Gajowniczek and Zabkowski choose ANN because they believe that timeseries analysis is not suitable for their work since they observe high volatility in the data [33]. Zufferey et al. [34] apply time delay neural network and find out that the individual consumer’s consumption is harder to predict than an aggregation of multiple consumers. Recently, researchers take advantage of LSTM to forecast building energy load using historical consumption data [35]. Cheng et al. [36] further manage to feed the concatenation of historical data and influence features as a sequential input to the LSTM network. Since they only use the LSTM network, all data are treated as sequential data. Despite the extensive research carried out in power demand forecasting area, to the best of our knowledge, there is no such neural network architecture taking consideration of heterogeneous features as PowerNet does.Another steam of related work is the anomaly detection in smart grids for nontechnical loss such as electricity theft. Bandim et.al [37] introduce an observer meter to observe the meter consumption of a set of users, and further identify the tampered meter using the deterministic and statistic approach. Later, Krishna et al. [38] discuss the detection capability based on such extra meters on different attacks. Other than these, linear regression [39]
, cluster outlier
[40][41] and SVM [42][43] are also used to detect the anomaly in smart girds. Furthermore, Mashima et al. [22] evaluate the effectiveness of several anomaly detection models including the average detector, ARMAGLR, and nonparametric statistics, and Local Outlier Factor (LOF). In this work, we discuss that PowerNet can be used in multiple anomaly detection layers.Vii Conclusions
In this article, we propose PowerNet, a power demand forecasting model based on modern recurrent neural network and multilayer perceptron network, which are capable of incorporating heterogeneous influence factors in a unified way. It demonstrates improvement in prediction accuracy compared to two stateoftheart approaches. Further evaluation under different settings with the realworld dataset is carried out to better understand the model capability and crucial operational considerations in practice, namely the length of the forecasting period and the model retraining interval. Finally, we briefly discussed the potential of PowerNet being adopted in the anomaly detection task in the smart metering process.
Acknowledgement
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under the Energy Programme and administrated by the Energy Market Authority (EP Award No. NRF2014EWTEIRP002040 and NRF2017EWTEP003047).
References
 [1] P. Siano, “Demand response and smart grids—a survey,” Renewable and Sustainable Energy Reviews, vol. 30, pp. 461–478, 2014.
 [2] A. Bansal, S. K. Rompikuntla, J. Gopinadhan, A. Kaur, and Z. A. Kazi, “Energy consumption forecasting for smart meters,” arXiv preprint arXiv:1512.05979, 2015.
 [3] W. Yu, D. An, D. Griffith, Q. Yang, and G. Xu, “Towards statistical modeling and machine learning based energy usage forecasting in smart grid,” ACM SIGAPP Applied Computing Review, vol. 15, no. 1, pp. 6–16, 2015.
 [4] “Umass smart* dataset  2017 release,” http://traces.cs.umass.edu/index.php/Smart/Smart.
 [5] B. Anderson, S. Lin, A. Newing, A. Bahaj, and P. James, “Electricity consumption and household characteristics: Implications for censustaking in a smart metered future,” Computers, Environment and Urban Systems, 2016.
 [6] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [7] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
 [8] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, 2016.
 [9] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000.
 [10] T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur, “Recurrent neural network based language model.” in Interspeech, vol. 2, 2010, p. 3.
 [11] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.

[12]
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
Proceedings of the 27th international conference on machine learning (ICML10), 2010, pp. 807–814.  [13] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [14] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [15] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
 [16] ——, “Stochastic gradient boosting,” Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 367–378, 2002.

[17]
B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal
margin classifiers,” in
Proceedings of the fifth annual workshop on Computational learning theory
. ACM, 1992, pp. 144–152.  [18] K.R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “Predicting time series with support vector machines,” in International Conference on Artificial Neural Networks. Springer, 1997, pp. 999–1004.
 [19] A. J. Conejo, M. A. Plazas, R. Espinola, and A. B. Molina, “Dayahead electricity price forecasting using the wavelet transform and arima models,” IEEE transactions on power systems, vol. 20, no. 2, pp. 1035–1042, 2005.
 [20] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
 [21] G. Gross and F. D. Galiana, “Shortterm load forecasting,” Proceedings of the IEEE, vol. 75, no. 12, pp. 1558–1573, 1987.
 [22] D. Mashima and A. A. Cárdenas, “Evaluating electricity theft detectors in smart grid networks,” in International Workshop on Recent Advances in Intrusion Detection. Springer, 2012, pp. 210–229.
 [23] D. Alberg and M. Last, “Shortterm load forecasting in smart meters with sliding windowbased arima algorithms,” in Asian Conference on Intelligent Information and Database Systems. Springer, 2017, pp. 299–307.
 [24] M. Cho, J. Hwang, and C. Chen, “Customer short term load forecasting by using arima transfer function model,” in Energy Management and Power Delivery, 1995. Proceedings of EMPD’95., 1995 International Conference on, vol. 1. IEEE, 1995, pp. 317–322.
 [25] T. Hong, M. Gui, M. E. Baran, and H. L. Willis, “Modeling and forecasting hourly electric load by multiple linear regression with interactions,” in IEEE Power and Energy Society General Meeting. IEEE, 2010, pp. 1–8.
 [26] S. Fan and R. J. Hyndman, “Shortterm load forecasting based on a semiparametric additive model,” IEEE Transactions on Power Systems, vol. 27, no. 1, pp. 134–141, 2012.
 [27] S. Arora and J. W. Taylor, “Forecasting electricity smart meter data using conditional kernel density estimation,” Omega, vol. 59, pp. 47–59, 2016.
 [28] B. Gładysz and D. Kuchta, “Application of regression trees in the analysis of electricity load,” Badania Operacyjne i Decyzje, no. 4, pp. 19–28, 2008.
 [29] Z. Yu, F. Haghighat, B. C. Fung, and H. Yoshino, “A decision tree method for building energy demand modeling,” Energy and Buildings, vol. 42, no. 10, pp. 1637–1646, 2010.
 [30] W.C. Hong, “Electric load forecasting by seasonal recurrent svr (support vector regression) with chaotic artificial bee colony algorithm,” Energy, vol. 36, no. 9, pp. 5568–5578, 2011.
 [31] Z. Qiu, “Electricity consumption prediction based on data mining techniques with particle swarm optimization,” International Journal of Database Theory and Application, vol. 6, no. 5, pp. 153–164, 2013.
 [32] H. Son and C. Kim, “Forecasting shortterm electricity demand in residential sector based on support vector regression and fuzzyrough feature selection with particle swarm optimization,” Procedia Engineering, vol. 118, pp. 1162–1168, 2015.
 [33] K. Gajowniczek and T. Ząbkowski, “Short term electricity forecasting using individual smart meter data,” Procedia Computer Science, vol. 35, pp. 589–597, 2014.
 [34] T. Zufferey, A. Ulbig, S. Koch, and G. Hug, “Forecasting of smart meter time series based on neural networks,” in Workshop „Data Analytics for Renewable Energy Integration (DARE), European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Riva del Guarda, 2016, pp. 19–23.
 [35] D. L. Marino, K. Amarasinghe, and M. Manic, “Building energy load forecasting using deep neural networks,” in 42nd Annual Conference of the IEEE Industrial Electronics Society, IECON 2016, 2016, pp. 7046–7051.
 [36] Y. Cheng, C. Xu, D. Mashima, V. L. Thing, and Y. Wu, “Powerlstm: Power demand forecasting using long shortterm memory neural network,” in International Conference on Advanced Data Mining and Applications. Springer, 2017, pp. 727–740.
 [37] C. Bandim, J. Alves, A. Pinto, F. Souza, M. Loureiro, C. Magalhaes, and F. GalvezDurand, “Identification of energy theft and tampered meters using a central observer meter: a mathematical approach,” in Transmission and Distribution Conference and Exposition, 2003 IEEE PES, vol. 1. IEEE, 2003, pp. 163–168.
 [38] V. B. Krishna, K. Lee, G. A. Weaver, R. K. Iyer, and W. H. Sanders, “Fdeta: A framework for detecting electricity theft attacks in smart grids,” in Dependable Systems and Networks (DSN), 2016 46th Annual IEEE/IFIP International Conference on. IEEE, 2016, pp. 407–418.
 [39] X. Liu and P. S. Nielsen, “Regressionbased online anomaly detection for smart grid data,” arXiv preprint arXiv:1606.05781, 2016.
 [40] D. M. Menon and N. Radhika, “Anomaly detection in smart grid traffic data for home area network,” in International Conference on Circuit, Power and Computing Technologies (ICCPCT), 2016. IEEE, 2016, pp. 1–4.

[41]
C. Chen and D. J. Cook, “Energy outlier detection in smart environments.”
Artificial Intelligence and Smarter Living, vol. 11, p. 07, 2011.  [42] J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed, and M. Mohamad, “Nontechnical loss detection for metered customers in power utility using support vector machines,” IEEE transactions on Power Delivery, vol. 25, no. 2, pp. 1162–1171, 2010.
 [43] P. Jokar, N. Arianpoo, and V. C. Leung, “Electricity theft detection in ami using customers’ consumption patterns,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 216–226, 2016.
Comments
There are no comments yet.