LoadCNN: A Efficient Green Deep Learning Model for Day-ahead Individual Resident Load Forecasting

Accurate day-ahead individual resident load forecasting is very important to various applications of smart grid. As a powerful machine learning technology, deep learning has shown great advantages in load forecasting task. However, deep learning is a computationally-hungry method, requires a plenty of training time and results in considerable energy consumed and a plenty of CO2 emitted. This aggravates the energy crisis and incurs a substantial cost to the environment. As a result, the deep learning methods are difficult to be popularized and applied in the real smart grid environment. In this paper, to reduce training time, energy consumed and CO2 emitted, we propose a efficient green model based on convolutional neural network, namely LoadCNN, for next-day load forecasting of individual resident. The training time, energy consumption, and CO2 emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other state-of-the-art models. Meanwhile, it achieves state-of-the-art performance in terms of prediction accuracy. LoadCNN is the first load forecasting model which simultaneously considers prediction accuracy, training time, energy efficiency and environment costs. It is a efficient green model that is able to be quickly, cost-effectively and environmental-friendly deployed in a realistic smart grid environment.



There are no comments yet.


page 1


Deep Learning Based Load Forecasting: from Research to Deployment – Opportunities and Challenges

Electricity load forecasting for buildings and campuses is becoming incr...

Digital Twins based Day-ahead Integrated Energy System Scheduling under Load and Renewable Energy Uncertainties

By constructing digital twins (DT) of an integrated energy system (IES),...

Peak Forecasting for Battery-based Energy Optimizations in Campus Microgrids

Battery-based energy storage has emerged as an enabling technology for a...

Multivariate Empirical Mode Decomposition based Hybrid Model for Day-ahead Peak Load Forecasting

Accurate day-ahead peak load forecasting is crucial not only for power d...

A New State-of-the-Art Transformers-Based Load Forecaster on the Smart Grid Domain

Meter-level load forecasting is crucial for efficient energy management ...

Representing ill-known parts of a numerical model using a machine learning approach

In numerical modeling of the Earth System, many processes remain unknown...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

According to the report of National Bureau of Statistics of People’s Republic of China, electricity consumption of residents was 907.16 billion kWh in 2017 [1]. Residents, as important participators of smart grid, have great potential to make contributions to the customer-oriented applications, for example demand response (DR), demand side management (DSM), energy storage system(ESS), etc [2]. In these cases, precise day-ahead individual resident load forecasting is significant and essential to balance the generation and consumption, minimize the operating cost and decrease reserve capacity, which helps to maintain the system security and remove the requirement of expensive energy storage systems [3, 4].

Although utilizing smart meter data to predict individual residential electric load is firstly reported by Ghofrani et al. in 2011, it is still a rather new area [5, 2]. There are few studies on individual residential electric load forecasting as it is an extremely challenging task. The reason for that is the huge uncertainty and volatility of electricity consumption behavior of residents, which are difficult to be handled by traditional machine learning methods [6, 7]. Fortunately, deep learning model has shown great potential in time series prediction recently [6, 2, 7, 8, 9, 10, 11]. Compared with traditional machine learning methods, deep learning shows significant superiority on individual residential electric load forecasting. Individual residential electric load forecasting has again attracted researchers’ attention.

Shi et al. firstly attempt to develop a deep learning model for individual resident load forecasting in 2017 [6]

. It is an encoder-decoder based model with a novel pooling mechanism to overcome over-fitting in deep learning models. Kong et al. propose a deep learning forecasting framework based on long short-term memory (LSTM) to address the volatility of the electricity consumption behaviors of individual residents 


. Wang et al. develope a gated recurrent unit (GRU) model which is a popular recurrent neural network (RNN) model to forecast the load of next day for individual resident 


. Wang et al. propose a LSTM model with a new loss function (pinball loss), week information and hour information to forecast the load of the individual resident 

[8]. Kong et al. recently propose a LSTM-based framework to handle the high volatility and uncertainty of load of individual residents [9].

However, we notice that the time for training a deep learning model is unbearable, which results in considerable energy consumed and a plenty of emitted, although deep learning shows significant superiority in forecasting accuracy compared with traditional machine learning methods. This has also been confirmed in many studies. For example, Strubell et al. investigate the training of an encoder-decoder based model LISA (the most popular model for the time series forecasting) [13]. 1) As shown in Table I, total GPU time for training the model111The time is required for researching and developing the model is hours ( years)222In reality, the model is deployed on GPUs (NVIDIA Titan X () and M40 ()), and the project spanned a period of 6 month [13].. 2) As shown in Table I

, the estimated cost is

and respectively in terms of electricity and Google cloud computing for researching and developing [13]. 3) As shown in Table II, Strubell et al. point out that the emission for training the encoder-decoder based model is five times of the emission for a car within its whole lifetime [13]. The huge waste of time and energy and the plenty of emission are problems that cannot be ignored in production and application.

To mitigate the situation above, we uncover the root causes of the phenomenon. Currently, the superiority of the deep learning depends on its complex network structure ( millions of parameters and several to hundreds of layers) which provides powerful ability to automatically learn complex nonlinear function relating the input to the prediction  [14]. However, training such complex network for once requires a plenty of training time which result in considerable energy consumed and a plenty of

emitted. What’s more, it is inevitable that developing such complex network requires tens to thousands of experiments to adjust the structure of network and hyperparameters of the network for following reasons.

  • Develop a new deep learning model: Plenty of different network structures and hyperparameters must be considered to obtain an optimal model, which requires a number of experiments.

  • Apply latest deep learning model to individual residential electric load forecasting: Leveraging technology that people already have in their pockets for a specific task is not as simple as it appears [15]. The deep learning model needs a lot of adjustments to suit the individual residential electric load forecasting, which also requires a number of experiments. What’s more, due to plenty of training time cost, it is difficult to keep up with the development of deep learning technology333

    The update of deep learning technology is very fast as it is the most popular technology in the field of artificial intelligence 


  • Deploy a deep learning model to a specific real environment: Commonly load forecasting models are only trained on a specific small data set, which makes the model heavily depends on specific local customer behavior and local climate. It needs to adjust and retrain the model on specific data set to deploy the model in other specific environments, which also requires a number of experiments.

The three issues are not taken into consideration in the previous studies. As a result, current state-of-the-art models in load forecasting is hardly deployed in real smart grid environment. In this paper, we shift the focus from forecasting accuracy to training efficiency, energy consumption and environmental costs of training a new model, especially that the improvement in forecasting accuracy is not significant in recent researches.

Hours Electricity Cloud computing cost
without tuning
120 $5 $52-$175
Training with
simple tuning
2880 $118 $1238-$4205
Training with
239942 $9870 $103k-$350k
TABLE I: The estimated cost for training a Encoder-decoder based model LISA [13].
Air travel, 1 passenger, 1984
Human life, avg, 1 year 11023
Car, avg incl. fuel, 1 lifetime 126000
Training without tuning for
a encoder-decoder based model
Training with tuning for
a encoder-decoder based model
TABLE II: The estimated emissions [13].

All state-of-the-art deep learning models for individual residential electric load forecasting, which directly handle the historical load curve of the smart meter, are based on RNN. And, it is well known that the training of RNN models is very time consuming and difficult to be paralleled. Thus, in this paper, we propose a convolutional neural network (CNN) based model LoadCNN with a simple network structure to reduce the training time, energy consumption, and emissions. The experiments show that our model significantly outperforms current state-of-the-art methods. The training time, energy consumption, and emissions of the our model are only approximate of other models, and our model achieves state-of-the-art performance in forecasting accuracy.

Fig. 1: The structure of LoadCNN. 

The contributions of this paper are as following four aspects:

  • New application: Our method firstly and directly applies CNN to day-ahead individual residential electric load forecasting.

  • New problem: Training efficiency, energy consumption and environmental costs are firstly considered in load forecasting task, which are important issues that have been ignored in previous researches.

  • New model: We propose a novel model LoadCNN based on CNN for predicting day-ahead individual resident load. The training time, energy consumption, and emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other state-of-the-art models. Meanwhile, it achieves state-of-the-art performance in terms of prediction accuracy.

  • Unlike most of previous deep learning based individual residential electric load forecasting researches that focus on the next time step only, we focus on day-ahead load forecasting, which is very important to day-ahead market. In this paper, we give a formal definition of the day-ahead load forecasting, and transform models that only forecasts the value of next time step to day-ahead load forecasting models for comparison.

The rest of this paper is structured as follows. Section II introduces our innovative approach. Section III describes the methodology of implementation. Section IV presents and discusses the results. Section V draws a concluding remark.

Ii Methodology

In this section, we give a formal definition of day-ahead forecasting and propose a nolvety CNN-based model LoadCNN for day-ahead load forecasting.

Ii-a Day-ahead Individual Resident Load Forecasting

Load curve represents electricity consumption behaviors of individual residents, which is very important to various customer-oriented applications in smart grid. Load curve of an individual resident is denoted as in this paper. And, we use historical load curve of individual resident to predict future load curve of individual resident . Here, time step divides load curve into input and output of load forecasting task.

In this work, we focus on day-ahead load forecasting based on historical load curve of past days, with and (half an hour interval, 48 data points for a day). Predicted load is defined by Equation 1.


The object of day-ahead load forecasting task is to minimize prediction defined by Equation 2.


Ii-B The detail on LoadCNN

In this section, we will elaborate our proposed method LoadCNN. As shown in Figure  1, LoadCNN consists of two parts: data preparation and load forecasting model. In addition, we also introduce day-ahead individual resident load forecasting algorithm which is based on LoadCNN.

Ii-B1 Data preparation

Data preprocessing is an essential step of the load forecasting. In our paper, five types of data are fed into LoadCNN: individual residential ID, month M, day D, week W, and historical load curve L. The detail on them is as follows:

  • The customer ID of individual residential ID

    is several vectors that are encoded by one hot encoder. Since the number of customer

    is generally large (), we utilize two vectors to uniquely represent a customer to obtain the vector with relative smaller size. The size of each vector is , and the size of ID is . Similarly, if necessary, we can use vectors to represent ID.

  • The month M, day D and week W are encoded by one hot encoder and belong to load curve to predict. The size of M, D and W respectively are , and .

  • Historical load curve L is a sequence of energy consumptions of the past days, and the size of L is .

Ii-B2 Load forecasting model

RNN-based model is mainstream model for sequence prediction, and it achieves state-of-the-art performance in individual resident load forecasting tasks [8, 9]. However, due to complex mechanism of RNN, the training of RNN-based model is time-consuming and requires a large amount of computing resources. In addition, the RNN-based model is difficult to parallel. Compared with RNN, CNN has a simpler neural network structure and achieves state-of-the-art performance in image processing realm [17]. Thus, we seek to develop an energy-saving and efficient green model based on CNN.

As shown in Figure 1

, LoadCNN consists three parts: input, feature extraction, and forecasting.

  • Input part only contains a action. links preprocessed data into a vector . After , the shape of the vector is .

  • Feature extraction part consists of convolution layers and max pooling layers. The

    convolution layers are one-dimensional (1D) convolutions. Feature maps are activated by Rectified Linear Unit (ReLU) function, and Kernel shape of convolutions respectively are

    , , , , , and . And the depths of feature maps respectively are , , , , , , and . Pooling size of 4 max pooling layers is , and each max pooling layer cuts the dimension of feature map by half.

  • For forecasting part, feature map is constructed by last convolution layer and simply flatted into one dimension data. Then, a fully connected layer is used to transform the one dimension data into the outputs. In addition, a technology, namely dropout, is adopted to overcome the overfitting problem in fully connected layer [18].

0:  Load dataset of residents demand from smart meters.
0:  The predicted Load of individual residents and the root mean squared error (RMSE), normalised root mean squared error (NRMSE), and mean absolute error(MAE).
1:  Clean and pre-process the load data and obtain a dataset . is the historical load and is the target.
2:  Divide into training set , validation set , and test set .
3:  Initialize all learnable parameters in LoadCNN.
4:  The best parameters .
5:  The best validation loss .
6:  for

 Current epoch

Max epoch do
7:     while  Any instances in not are selected in this epoch. do
8:        Select a batch of instances from .
9:        Find by minimizing the defined by Equation 2 with .
10:        if  then
11:           Randomly select a batch of instances from .
12:           Calculate defined by Equation 2 with .
13:           if  then
16:           end if
17:        end if
18:     end while
19:  end for
20:  Forecast the by LoadCNN with .
21:  Calculate the RMSE, NRMSE, and MAE with the and .
Algorithm 1 The algorithm for individual residential electric load forecasting

Ii-B3 Algorithm

The algorithm designed includes three parts as shown in Algorithm 1: 1) data pre-processing, 2) network training, and 3) evaluation.

Iii The methodology of implementation

Iii-a Data description

To evaluate the performance of LoadCNN, we conduct the experiments on a large-scale smart meter dataset from Smart Metering Electricity Customer Behaviour Trials (CBTs) in Ireland [19]. The data is collected from over Irish customers for the period of days between 1st July 2009 and 31st December 2010. The smart meter data is half-hourly sampled electricity consumption (kWh) data from each customer.

In CBTs, we selected the customers which meet the condition that residential customers with the controlled stimulus and controlled tariff because of the following two aspects: (1) selected customers were billed on existing flat rate without any DSM stimuli. (2) selected customers are the most representative444The majorities of consumers outside trial are of the type [6]. Finally, residential customers are selected to verify our method.

To verify our method, we divide the dataset into three sets: training set, validation set, and test set. The test set contains all the data of the last days. The validation set contains data of days which are randomly selected from the days. The training set contains all of the rest data.

Iii-B Experiment Setup

All of models for all customers are built on a server with two Intel Xeon E5-2630 v4 processors,

GB of memory and four NVIDIA Titan Xp GPUs. Server system is Linux 3.10.0-327.el7.x86_64. In addition, all of models are implemented by the TensorFlow-gpu 1.10.0v library 

[20] and Python 3.6.7v.

The parameters for all models are presented as follows: batch size=, max epoch=

, hidden neuron number of RNN=

, learning rate=, decay rate=, dropout rate=. In addition, in order to facilitate the comparison of training time and energy consumption, each model runs on only one GPU.

Iii-C Metrics

In this work, three widely used metrics are applied to evaluate the accuracy of LoadCNN: root mean squared error (RMSE), normalised root mean squared error (NRMSE), and mean absolute error (MAE).


Here, is the predicted value, is the actual value, and are the maximum and minimum value of respectively. is the number of point in the test set.

Meanwhile, energy efficiency and training efficiency are also need to measure in our work. Energy consumption () is defined in Equation 6 as GPU consumes the most part of energy.


Here, is the power of GPU during training the model. The represents the training time of a model for one training. is the power usage effectiveness and accounts for the additional energy that is required to support the compute infrastructure (mainly cooling) [13]. is the number of times to train a model. The detailed settings of the parameters are as follows.

  • : as Figure 2 shows, the differences of the power drawn of a GPU during training a model are negligible. Thus, to simplify the problem and minimize the impact of monitoring procedures on training, we randomly select the average power within 30 minutes during model training as for model training.

  • : its coefficient is set as 1.58 (global average for data center) according to the study [13].

  • : In general, hyperparameter tuning is a big topic and essential to obtain the best forecasting performance [9]. In the recent [21] work, to obtain the best performance of an encoder-decoder model, the author did 4789 trials [13]. The task of the model, which is a task forecasting of sequence to sequence, is similar to the day-ahead individual resident load forecasting. Thus, In our paper, to simplify the problem we assume that NT= trials are required to obtain the best performance of a model.

The reasons for assumes above are as follows: 1) Every model runs on same sever. 2) Every model runs on a NVIDIA Titan Xp GPU only. 3) Most of the energy consumption of training a model is on GPU.

The emissions is presented as Equation 7 according to U.S. Environmental Protection Agency [13].

Fig. 2: The power draw of a GPU during training a encoder-decoder model. 

Iii-D Day-ahead Individual Resident Load Forecasting methods for Comparison

We use models from four types of popular deep learning methods as benchmarks in present work: classic RNN-based model, RNN and CNN-based model, encoder-decoder-based models and CNN-based model.

  • LSTM, a most popular RNN model for time series prediction, is commonly used for load forecasting since 2017 [2]. In our paper, we transform the model into a day-ahead load forecasting model to compare with our model.

  • LSTM-Week is a recently proposed load forecasting model. It uses a new loss function and considers the week and hour information [8]. In order to compare with our model, we ignore the new loss function and hour information.

  • LSTM-EID is also a recently proposed load forecasting model. It considers the week, record point position in a day and holiday information [9]. Since the dataset used in this work do not contains holiday information, the holiday information is ignored. In order to compare with our model, we also transform the model into a day-ahead load forecasting model.

  • GRU is another popular RNN model for time series prediction and applied in day-ahead load forecasting [12]. The model considers date, weather and temperature information. Since the dataset used in this work do not contains weather and temperature information, we ignore weather and temperature information.

  • Skip-RNN is a RNN model that is able to capture long term dependencies and relieve vanishing gradients when the model is trained on long sequences [22]. Since the length of input in this work is , the Skip-RNN is considered as a benchmark.

  • LSTM-CNN is a model that mixes typical LSTM and CNN which is similar to a famous model–inception models [23]. The types of LSTM-CNN model have been used to load forecasting on area level and industrial distribution complexes [24, 25]. In order to compare with our model, we transform LSTM-CNN model into a day-ahead load forecasting model on individual resident level.

  • seq2seq is a LSTM-based encoder-decoder model which is the most popular model for the forecasting of sequence to sequence.

  • seq2seq-pooling is recently proposed to relieve the overfitting in load forecasting [6]. In order to compare with our model, we transform the model into a day-ahead load forecasting model and used the dropout technology to further relieve the overfitting.

  • seq2seq-attention is a encoder-decoder model that combines the attention mechanism to handle the long sequences [26].

  • Temporal convolutional network (TCN) is recently proposed to handle sequence and achieves the state-of-the-art performance in many sequence modeling tasks [27]. It has been used to load forecasting on individual resident level [28]. In order to compare with our model, we also transform the model into a day-ahead load forecasting model.

  • ResNet, a CNN-based model, is the state-of-the-art method in image recognition task [29].

Iv Results and discussion

In this section, we present and discuss the results of the experiments in terms of training efficiency, energy consumption, environmental costs and prediction accuracy. In addition, we also investigate the effect of number of layers in deep learning model since the deeper the model is the more complex the network structure of the model is and the more training time is needed which results in more energy consumed and emitted.

TT without
tuning (h)
Power (W)
EC (kWh)
Easy to
year of
related work
164.42 66.1656 17188.7378 16398.0559 No 0.6192 0.0473 0.3636 3 336 2017
239.22 - - - No 0.6157 0.0470 0.3511 5 336 -
365.65 - - - No 0.7375 0.0563 0.4085 8 336 -
164.73 68.5650 17845.6456 17024.7459 No 0.6246 0.0477 0.3665 3 336 2019
161.58 68.3967 17461.4313 16658.2055 No 0.6153 0.0470 0.3639 3 336 2019
170.30 64.5683 17373.6508 16574.4629 No 0.6156 0.0470 0.3487 3 336 2018
190.33 64.1756 19298.9762 18411.2233 No 0.6147 0.0469 0.3477 3 336 -
153.2 67.3422 16300.5835 15550.7567 No 0.6184 0.0472 0.3583 3-8 336-1 -
165.02 72.4456 18888.8572 18019.9698 No 0.6641 0.0507 0.4101 3-3 336-48 -
274.28 - - - No 0.6771 0.0517 0.4806 5-5 336-48 -
389.95 - - - No 0.6881 0.0525 0.4941 8-8 336-48 -
164.12 66.4822 17239.4727 16446.4570 No 0.6713 0.0513 0.3922 3-3 336-48 2017
246.22 - - - No 0.6581 0.0503 0.4332 5-5 336-48 -
382.95 - - - No 0.7252 0.0554 0.5474 8-8 336-48 -
180.33 87.1394 24827.8798 23685.7973 No 0.6549 0.0500 0.4005 3-3 336-48 -
20.55 218.3589 7089.8951 6763.7599 Yes 0.8770 0.0670 0.4731 8 1 -
7.15 187.5428 2118.6710 2021.2121 Yes 0.6261 0.0478 0.3673 34 1 -
LoadCNN (Our) 2.30 69.0600 250.6940 239.4197 Yes 0.6104 0.0466 0.3523 8 1 -
TABLE III: Performance comparison.

Iv-a Training efficiency, energy consumption and environmental costs

As shown in Table III, our model not only achieves the highest prediction accuracy, but also obtains superior performance in training time, energy consumption and emissions compared with all the other models. Specifically, in training time, LoadCNN takes the shortest time that is only approximate of other RNN-based models. What’s more, LoadCNN is based on CNN and very easy to parallel. Thus, the training time of LoadCNN is able to be further reduced by simply adjusting the code of implementation and adding more GPUs. As for energy consumption and emissions, LoadCNN is also only approximate of other RNN-based models.

The reason for the results is that LoadCNN has a simple network structure which is easy to be trained. However, contrary to our model: 1) The steps of RNN-based model is which is a quite large number and makes the structure of the model to be very complex when it is training, though the layer of the RNN based model is . 2) The other CNN based models are also much more complex than our model.

In addition, compared with the experiment, the training time, energy consumption and emissions of the model will be more in reality. In this experiment the training set only contains the data from customers for days. And the energy consumption and training time of the mainstream models are expected to exceed kWh and h ( years) respectively555Of course, if there are enough GPUs, we can perform multiple parameter adjustment experiments at the same time.. However, in real environment the training set should contain hundreds of thousands or even more customers, which will significantly increase the time and energy consumption of the training model. Therefore, training efficient and low-energy models like our model is significant.

Iv-B The prediction accuracy of different deep learning models.

It is found that the prediction accuracy is hard to improve only by constructing different deep learning models. Specially, as shown in Table III, the best performance ones of classical RNN-based models, RNN-CNN-based models, and CNN-based models have little difference in the accuracy metrics , , and . Usually, the tiny difference is likely to be eliminated by adjusting hyperparameters. Consistent with Table III, as shown in Figure 3 except for encoder-decoder based models and TCN model, the prediction performances of other models are not much different. It means that it is difficult to use current deep learning technology to make a major breakthrough in forecasting accuracy of day-ahead individual resident load forecasting.

In order to improve the accuracy of the forecast, we need to pay more attention to obtain the information about personal activities and external information, as the electricity consumption of the household is extremely dependent on the randomness of individual human behaviors and external factors666For example, both the business trip of the residents and the change of indoor temperature will cause electricity consumption changes which are difficult to predict only by the historical load curve of the individual resident..

In addition, it is worth noting that there is a large gap between the performance of the encoder-decoder based models and state-of-the-art models. This can be explained by the mechanism of the decoder. In the day-ahead individual resident load forecasting task, the decoder predicts the value of the current point based on the value of the previous point and the current state of model. Unfortunately, we can not directly obtain the value of the previous point, and the predicted value of the previous point is used to replace the actual value. Therefore, the forecasting errors will be accumulated and amplified.

Finally, it is also worth noting that compared with the previous very short-term (such as 15-min-ahead) load forecasting work which predict the electricity consumption more accurately, the day-ahead load forecasting tends to predict electricity consumption pattern of the customers. For example, as shown in Figure 3, the early peak and the three late peaks of the actual load curve cannot be accurately predicted.

Fig. 3: The forecasting loads by different models and the realistic load. 

Iv-C Effect by the number of layer in model

The recent revival of neural networks has benefited from the development of computer hardware that has made neural networks deeper and deeper which is the main cause of the high complexity of deep learning model.

In general, the deeper the neural network is the more precise the prediction is. However, as shown in the Table III, deeper models of LSTM, seq2seq, seq2seq-pooling, CNN do not perform better than shallow models. What’s more, -layer LSTM, seq2seq and seq2seq-pooling models have terrible performance. It means that on the one hand, the most powerful means of deep learning – increasing depth can no longer help improve the accuracy of the model. On the other hand we need to develop new technologies to solve the over-fitting problem.

As a conclusion, it is unnecessary to make the model deeper, which leads to a more complex network structure that more training time is needed, more energy is consumed and more is emitted.

V Conclusion

Day-ahead individual resident load forecasting is very important to real applications (such as demand response) of smart grid. Deep learning models have became commonly used methods in load forecasting. However, the deep learning models are computationally-hungry, which requires plenty of training time, and results in considerable energy consumption and plenty of emissions. All of the previous load forecasting works only focus on improving prediction accuracy and ignore training efficiency, energy consumption and environmental costs.

To save resources and promote the application of deep learning models, we propose and develop an efficient green CNN-based model LoadCNN. It not only achieves state-of-the-art performance but also has huge advantages in training efficiency, energy consumption and environmental costs. The experimental results on public and large-scale dataset show that the training time, energy consumption, and emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other state-of-the-art models.

In addition, it is found that it is difficult to improve the accuracy by simply adjusting the hyperparameters or structure of deep learning models. In the future, to improve the accuracy, we must obtain more related information (such as human activities).


We are very grateful to CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-2010 and ISSDA. This work is supported by the Major Program of National Natural Science Foundation of China (Grant No. 61432006).


  • [1] N. B. of Statistics of China, “Annual data,” http://data.stats.gov.cn/easyquery.htm?cn=C01, accessed June 12, 2019.
  • [2] W. Kong, Z. Y. Dong, D. J. Hill, F. Luo, and Y. Xu, “Short-term residential load forecasting based on resident behaviour learning,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1087–1088, 2017.
  • [3] A. Tascikaraoglu and B. M. Sanandaji, “Short-term residential electric load forecasting: A compressive spatio-temporal approach,” Energy and Buildings, vol. 111, pp. 380–392, 2016.
  • [4] K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He, “Short-term load forecasting with deep residual networks,” IEEE Transactions on Smart Grid, 2018.
  • [5] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M. S. Fadali, “Smart meter based short-term load forecasting for residential customers,” in 2011 North American Power Symposium.   IEEE, 2011, pp. 1–5.
  • [6] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—a novel pooling deep rnn,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2017.
  • [7] M. H. Alobaidi, F. Chebana, and M. A. Meguid, “Robust ensemble learning framework for day-ahead forecasting of household based energy consumption,” Applied energy, vol. 212, pp. 997–1012, 2018.
  • [8] Y. Wang, D. Gan, M. Sun, N. Zhang, Z. Lu, and C. Kang, “Probabilistic individual load forecasting using pinball loss guided lstm,” Applied Energy, vol. 235, pp. 10–20, 2019.
  • [9] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Short-term residential load forecasting based on lstm recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, 2019.
  • [10] Y. Peng, Y. Wang, X. Lu, H. Li, D. Shi, Z. Wang, and J. Li, “Short-term load forecasting at different aggregation levels with predictability analysis,” arXiv preprint arXiv:1903.10679, 2019.
  • [11] M. Cai, M. Pipattanasomporn, and S. Rahman, “Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques,” Applied Energy, vol. 236, pp. 1078–1088, 2019.
  • [12] Y. Wang, M. Liu, Z. Bao, and S. Zhang, “Short-term load forecasting with multi-source data using gated recurrent unit neural networks,” Energies, vol. 11, no. 5, p. 1138, 2018.
  • [13] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” arXiv preprint arXiv:1906.02243, 2019.
  • [14] X.-X. Zhou, W.-F. Zeng, H. Chi, C. Luo, C. Liu, J. Zhan, S.-M. He, and Z. Zhang, “pdeep: Predicting ms/ms spectra of peptides with deep learning,” Analytical chemistry, vol. 89, no. 23, pp. 12 690–12 697, 2017.
  • [15] Y. Chen, C. Hu, B. Hu, L. Hu, H. Yu, and C. Miao, “Inferring cognitive wellness from motor patterns,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 12, pp. 2340–2353, 2018.
  • [16] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series classification: a review,” Data Mining and Knowledge Discovery, pp. 1–47, 2019.
  • [17]

    A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,”

    Computational intelligence and neuroscience, vol. 2018, 2018.
  • [18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [19] C. E. Smart, “Metering customer behaviour trials (cbt) findings report,” 2011.
  • [20] Google, “Tensorflow,” https://tensorflow.google.cn/versions/r1.10/api_docs/python/tf, accessed June 21, 2019.
  • [21] E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum, “Linguistically-informed self-attention for semantic role labeling,” in

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

    , 2018, pp. 5027–5038.
  • [22] V. Campos Camunez, B. Jou, X. Giró Nieto, J. Torres Viñals, and S.-F. Chang, “Skip rnn: learning to skip state updates in recurrent neural networks,” in Sixth International Conference on Learning Representations: Monday April 30-Thursday May 03, 2018, Vancouver Convention Center, Vancouver:[proceedings], 2018, pp. 1–17.
  • [23]

    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in

    Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • [24] C. Tian, J. Ma, C. Zhang, and P. Zhan, “A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network,” Energies, vol. 11, no. 12, p. 3493, 2018.
  • [25] J. Kim, J. Moon, E. Hwang, and P. Kang, “Recurrent inception convolution neural network for multi short-term load forecasting,” Energy and Buildings, vol. 194, pp. 328–341, 2019.
  • [26] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
  • [27] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018.
  • [28] M. Voß, C. Bender-Saebelkampf, and S. Albayrak, “Residential short-term load forecasting using convolutional neural networks,” in 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm).   IEEE, 2018, pp. 1–6.
  • [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2016, pp. 770–778.