I Introduction
According to the report of National Bureau of Statistics of People’s Republic of China, electricity consumption of residents was 907.16 billion kWh in 2017 [1]. Residents, as important participators of smart grid, have great potential to make contributions to the customeroriented applications, for example demand response (DR), demand side management (DSM), energy storage system(ESS), etc [2]. In these cases, precise dayahead individual resident load forecasting is significant and essential to balance the generation and consumption, minimize the operating cost and decrease reserve capacity, which helps to maintain the system security and remove the requirement of expensive energy storage systems [3, 4].
Although utilizing smart meter data to predict individual residential electric load is firstly reported by Ghofrani et al. in 2011, it is still a rather new area [5, 2]. There are few studies on individual residential electric load forecasting as it is an extremely challenging task. The reason for that is the huge uncertainty and volatility of electricity consumption behavior of residents, which are difficult to be handled by traditional machine learning methods [6, 7]. Fortunately, deep learning model has shown great potential in time series prediction recently [6, 2, 7, 8, 9, 10, 11]. Compared with traditional machine learning methods, deep learning shows significant superiority on individual residential electric load forecasting. Individual residential electric load forecasting has again attracted researchers’ attention.
Shi et al. firstly attempt to develop a deep learning model for individual resident load forecasting in 2017 [6]
. It is an encoderdecoder based model with a novel pooling mechanism to overcome overfitting in deep learning models. Kong et al. propose a deep learning forecasting framework based on long shortterm memory (LSTM) to address the volatility of the electricity consumption behaviors of individual residents
[2]. Wang et al. develope a gated recurrent unit (GRU) model which is a popular recurrent neural network (RNN) model to forecast the load of next day for individual resident
[12]. Wang et al. propose a LSTM model with a new loss function (pinball loss), week information and hour information to forecast the load of the individual resident
[8]. Kong et al. recently propose a LSTMbased framework to handle the high volatility and uncertainty of load of individual residents [9].However, we notice that the time for training a deep learning model is unbearable, which results in considerable energy consumed and a plenty of emitted, although deep learning shows significant superiority in forecasting accuracy compared with traditional machine learning methods. This has also been confirmed in many studies. For example, Strubell et al. investigate the training of an encoderdecoder based model LISA (the most popular model for the time series forecasting) [13]. 1) As shown in Table I, total GPU time for training the model^{1}^{1}1The time is required for researching and developing the model is hours ( years)^{2}^{2}2In reality, the model is deployed on GPUs (NVIDIA Titan X () and M40 ()), and the project spanned a period of 6 month [13].. 2) As shown in Table I
, the estimated cost is
and respectively in terms of electricity and Google cloud computing for researching and developing [13]. 3) As shown in Table II, Strubell et al. point out that the emission for training the encoderdecoder based model is five times of the emission for a car within its whole lifetime [13]. The huge waste of time and energy and the plenty of emission are problems that cannot be ignored in production and application.To mitigate the situation above, we uncover the root causes of the phenomenon. Currently, the superiority of the deep learning depends on its complex network structure ( millions of parameters and several to hundreds of layers) which provides powerful ability to automatically learn complex nonlinear function relating the input to the prediction [14]. However, training such complex network for once requires a plenty of training time which result in considerable energy consumed and a plenty of
emitted. What’s more, it is inevitable that developing such complex network requires tens to thousands of experiments to adjust the structure of network and hyperparameters of the network for following reasons.

Develop a new deep learning model: Plenty of different network structures and hyperparameters must be considered to obtain an optimal model, which requires a number of experiments.

Apply latest deep learning model to individual residential electric load forecasting: Leveraging technology that people already have in their pockets for a specific task is not as simple as it appears [15]. The deep learning model needs a lot of adjustments to suit the individual residential electric load forecasting, which also requires a number of experiments. What’s more, due to plenty of training time cost, it is difficult to keep up with the development of deep learning technology^{3}^{3}3
The update of deep learning technology is very fast as it is the most popular technology in the field of artificial intelligence
[16].. 
Deploy a deep learning model to a specific real environment: Commonly load forecasting models are only trained on a specific small data set, which makes the model heavily depends on specific local customer behavior and local climate. It needs to adjust and retrain the model on specific data set to deploy the model in other specific environments, which also requires a number of experiments.
The three issues are not taken into consideration in the previous studies. As a result, current stateoftheart models in load forecasting is hardly deployed in real smart grid environment. In this paper, we shift the focus from forecasting accuracy to training efficiency, energy consumption and environmental costs of training a new model, especially that the improvement in forecasting accuracy is not significant in recent researches.
Hours  Electricity  Cloud computing cost  


120  $5  $52$175  

2880  $118  $1238$4205  

239942  $9870  $103k$350k 
(lbs)  
Air travel, 1 passenger,  1984  
Human life, avg, 1 year  11023  
Car, avg incl. fuel, 1 lifetime  126000  

192  

626155 
All stateoftheart deep learning models for individual residential electric load forecasting, which directly handle the historical load curve of the smart meter, are based on RNN. And, it is well known that the training of RNN models is very time consuming and difficult to be paralleled. Thus, in this paper, we propose a convolutional neural network (CNN) based model LoadCNN with a simple network structure to reduce the training time, energy consumption, and emissions. The experiments show that our model significantly outperforms current stateoftheart methods. The training time, energy consumption, and emissions of the our model are only approximate of other models, and our model achieves stateoftheart performance in forecasting accuracy.
The contributions of this paper are as following four aspects:

New application: Our method firstly and directly applies CNN to dayahead individual residential electric load forecasting.

New problem: Training efficiency, energy consumption and environmental costs are firstly considered in load forecasting task, which are important issues that have been ignored in previous researches.

New model: We propose a novel model LoadCNN based on CNN for predicting dayahead individual resident load. The training time, energy consumption, and emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other stateoftheart models. Meanwhile, it achieves stateoftheart performance in terms of prediction accuracy.

Unlike most of previous deep learning based individual residential electric load forecasting researches that focus on the next time step only, we focus on dayahead load forecasting, which is very important to dayahead market. In this paper, we give a formal definition of the dayahead load forecasting, and transform models that only forecasts the value of next time step to dayahead load forecasting models for comparison.
The rest of this paper is structured as follows. Section II introduces our innovative approach. Section III describes the methodology of implementation. Section IV presents and discusses the results. Section V draws a concluding remark.
Ii Methodology
In this section, we give a formal definition of dayahead forecasting and propose a nolvety CNNbased model LoadCNN for dayahead load forecasting.
Iia Dayahead Individual Resident Load Forecasting
Load curve represents electricity consumption behaviors of individual residents, which is very important to various customeroriented applications in smart grid. Load curve of an individual resident is denoted as in this paper. And, we use historical load curve of individual resident to predict future load curve of individual resident . Here, time step divides load curve into input and output of load forecasting task.
In this work, we focus on dayahead load forecasting based on historical load curve of past days, with and (half an hour interval, 48 data points for a day). Predicted load is defined by Equation 1.
(1) 
The object of dayahead load forecasting task is to minimize prediction defined by Equation 2.
(2) 
IiB The detail on LoadCNN
In this section, we will elaborate our proposed method LoadCNN. As shown in Figure 1, LoadCNN consists of two parts: data preparation and load forecasting model. In addition, we also introduce dayahead individual resident load forecasting algorithm which is based on LoadCNN.
IiB1 Data preparation
Data preprocessing is an essential step of the load forecasting. In our paper, five types of data are fed into LoadCNN: individual residential ID, month M, day D, week W, and historical load curve L. The detail on them is as follows:

The customer ID of individual residential ID
is several vectors that are encoded by one hot encoder. Since the number of customer
is generally large (), we utilize two vectors to uniquely represent a customer to obtain the vector with relative smaller size. The size of each vector is , and the size of ID is . Similarly, if necessary, we can use vectors to represent ID. 
The month M, day D and week W are encoded by one hot encoder and belong to load curve to predict. The size of M, D and W respectively are , and .

Historical load curve L is a sequence of energy consumptions of the past days, and the size of L is .
IiB2 Load forecasting model
RNNbased model is mainstream model for sequence prediction, and it achieves stateoftheart performance in individual resident load forecasting tasks [8, 9]. However, due to complex mechanism of RNN, the training of RNNbased model is timeconsuming and requires a large amount of computing resources. In addition, the RNNbased model is difficult to parallel. Compared with RNN, CNN has a simpler neural network structure and achieves stateoftheart performance in image processing realm [17]. Thus, we seek to develop an energysaving and efficient green model based on CNN.
As shown in Figure 1
, LoadCNN consists three parts: input, feature extraction, and forecasting.

Input part only contains a action. links preprocessed data into a vector . After , the shape of the vector is .

Feature extraction part consists of convolution layers and max pooling layers. The
convolution layers are onedimensional (1D) convolutions. Feature maps are activated by Rectified Linear Unit (ReLU) function, and Kernel shape of convolutions respectively are
, , , , , and . And the depths of feature maps respectively are , , , , , , and . Pooling size of 4 max pooling layers is , and each max pooling layer cuts the dimension of feature map by half. 
For forecasting part, feature map is constructed by last convolution layer and simply flatted into one dimension data. Then, a fully connected layer is used to transform the one dimension data into the outputs. In addition, a technology, namely dropout, is adopted to overcome the overfitting problem in fully connected layer [18].
IiB3 Algorithm
The algorithm designed includes three parts as shown in Algorithm 1: 1) data preprocessing, 2) network training, and 3) evaluation.
Iii The methodology of implementation
Iiia Data description
To evaluate the performance of LoadCNN, we conduct the experiments on a largescale smart meter dataset from Smart Metering Electricity Customer Behaviour Trials (CBTs) in Ireland [19]. The data is collected from over Irish customers for the period of days between 1st July 2009 and 31st December 2010. The smart meter data is halfhourly sampled electricity consumption (kWh) data from each customer.
In CBTs, we selected the customers which meet the condition that residential customers with the controlled stimulus and controlled tariff because of the following two aspects: (1) selected customers were billed on existing flat rate without any DSM stimuli. (2) selected customers are the most representative^{4}^{4}4The majorities of consumers outside trial are of the type [6]. Finally, residential customers are selected to verify our method.
To verify our method, we divide the dataset into three sets: training set, validation set, and test set. The test set contains all the data of the last days. The validation set contains data of days which are randomly selected from the days. The training set contains all of the rest data.
IiiB Experiment Setup
All of models for all customers are built on a server with two Intel Xeon E52630 v4 processors,
GB of memory and four NVIDIA Titan Xp GPUs. Server system is Linux 3.10.0327.el7.x86_64. In addition, all of models are implemented by the TensorFlowgpu 1.10.0v library
[20] and Python 3.6.7v.The parameters for all models are presented as follows: batch size=, max epoch=
, hidden neuron number of RNN=
, learning rate=, decay rate=, dropout rate=. In addition, in order to facilitate the comparison of training time and energy consumption, each model runs on only one GPU.IiiC Metrics
In this work, three widely used metrics are applied to evaluate the accuracy of LoadCNN: root mean squared error (RMSE), normalised root mean squared error (NRMSE), and mean absolute error (MAE).
(3) 
(4) 
(5) 
Here, is the predicted value, is the actual value, and are the maximum and minimum value of respectively. is the number of point in the test set.
Meanwhile, energy efficiency and training efficiency are also need to measure in our work. Energy consumption () is defined in Equation 6 as GPU consumes the most part of energy.
(6) 
Here, is the power of GPU during training the model. The represents the training time of a model for one training. is the power usage effectiveness and accounts for the additional energy that is required to support the compute infrastructure (mainly cooling) [13]. is the number of times to train a model. The detailed settings of the parameters are as follows.

: as Figure 2 shows, the differences of the power drawn of a GPU during training a model are negligible. Thus, to simplify the problem and minimize the impact of monitoring procedures on training, we randomly select the average power within 30 minutes during model training as for model training.

: its coefficient is set as 1.58 (global average for data center) according to the study [13].

: In general, hyperparameter tuning is a big topic and essential to obtain the best forecasting performance [9]. In the recent [21] work, to obtain the best performance of an encoderdecoder model, the author did 4789 trials [13]. The task of the model, which is a task forecasting of sequence to sequence, is similar to the dayahead individual resident load forecasting. Thus, In our paper, to simplify the problem we assume that NT= trials are required to obtain the best performance of a model.
The reasons for assumes above are as follows: 1) Every model runs on same sever. 2) Every model runs on a NVIDIA Titan Xp GPU only. 3) Most of the energy consumption of training a model is on GPU.
(7) 
IiiD Dayahead Individual Resident Load Forecasting methods for Comparison
We use models from four types of popular deep learning methods as benchmarks in present work: classic RNNbased model, RNN and CNNbased model, encoderdecoderbased models and CNNbased model.

LSTM, a most popular RNN model for time series prediction, is commonly used for load forecasting since 2017 [2]. In our paper, we transform the model into a dayahead load forecasting model to compare with our model.

LSTMWeek is a recently proposed load forecasting model. It uses a new loss function and considers the week and hour information [8]. In order to compare with our model, we ignore the new loss function and hour information.

LSTMEID is also a recently proposed load forecasting model. It considers the week, record point position in a day and holiday information [9]. Since the dataset used in this work do not contains holiday information, the holiday information is ignored. In order to compare with our model, we also transform the model into a dayahead load forecasting model.

GRU is another popular RNN model for time series prediction and applied in dayahead load forecasting [12]. The model considers date, weather and temperature information. Since the dataset used in this work do not contains weather and temperature information, we ignore weather and temperature information.

SkipRNN is a RNN model that is able to capture long term dependencies and relieve vanishing gradients when the model is trained on long sequences [22]. Since the length of input in this work is , the SkipRNN is considered as a benchmark.

LSTMCNN is a model that mixes typical LSTM and CNN which is similar to a famous model–inception models [23]. The types of LSTMCNN model have been used to load forecasting on area level and industrial distribution complexes [24, 25]. In order to compare with our model, we transform LSTMCNN model into a dayahead load forecasting model on individual resident level.

seq2seq is a LSTMbased encoderdecoder model which is the most popular model for the forecasting of sequence to sequence.

seq2seqpooling is recently proposed to relieve the overfitting in load forecasting [6]. In order to compare with our model, we transform the model into a dayahead load forecasting model and used the dropout technology to further relieve the overfitting.

seq2seqattention is a encoderdecoder model that combines the attention mechanism to handle the long sequences [26].

Temporal convolutional network (TCN) is recently proposed to handle sequence and achieves the stateoftheart performance in many sequence modeling tasks [27]. It has been used to load forecasting on individual resident level [28]. In order to compare with our model, we also transform the model into a dayahead load forecasting model.

ResNet, a CNNbased model, is the stateoftheart method in image recognition task [29].
Iv Results and discussion
In this section, we present and discuss the results of the experiments in terms of training efficiency, energy consumption, environmental costs and prediction accuracy. In addition, we also investigate the effect of number of layers in deep learning model since the deeper the model is the more complex the network structure of the model is and the more training time is needed which results in more energy consumed and emitted.












164.42  66.1656  17188.7378  16398.0559  No  0.6192  0.0473  0.3636  3  336  2017  
239.22        No  0.6157  0.0470  0.3511  5  336    
365.65        No  0.7375  0.0563  0.4085  8  336    
164.73  68.5650  17845.6456  17024.7459  No  0.6246  0.0477  0.3665  3  336  2019  
161.58  68.3967  17461.4313  16658.2055  No  0.6153  0.0470  0.3639  3  336  2019  
170.30  64.5683  17373.6508  16574.4629  No  0.6156  0.0470  0.3487  3  336  2018  
190.33  64.1756  19298.9762  18411.2233  No  0.6147  0.0469  0.3477  3  336    
153.2  67.3422  16300.5835  15550.7567  No  0.6184  0.0472  0.3583  38  3361    
165.02  72.4456  18888.8572  18019.9698  No  0.6641  0.0507  0.4101  33  33648    
274.28        No  0.6771  0.0517  0.4806  55  33648    
389.95        No  0.6881  0.0525  0.4941  88  33648    
164.12  66.4822  17239.4727  16446.4570  No  0.6713  0.0513  0.3922  33  33648  2017  
246.22        No  0.6581  0.0503  0.4332  55  33648    
382.95        No  0.7252  0.0554  0.5474  88  33648    
180.33  87.1394  24827.8798  23685.7973  No  0.6549  0.0500  0.4005  33  33648    
20.55  218.3589  7089.8951  6763.7599  Yes  0.8770  0.0670  0.4731  8  1    
7.15  187.5428  2118.6710  2021.2121  Yes  0.6261  0.0478  0.3673  34  1    
LoadCNN (Our)  2.30  69.0600  250.6940  239.4197  Yes  0.6104  0.0466  0.3523  8  1   
Iva Training efficiency, energy consumption and environmental costs
As shown in Table III, our model not only achieves the highest prediction accuracy, but also obtains superior performance in training time, energy consumption and emissions compared with all the other models. Specifically, in training time, LoadCNN takes the shortest time that is only approximate of other RNNbased models. What’s more, LoadCNN is based on CNN and very easy to parallel. Thus, the training time of LoadCNN is able to be further reduced by simply adjusting the code of implementation and adding more GPUs. As for energy consumption and emissions, LoadCNN is also only approximate of other RNNbased models.
The reason for the results is that LoadCNN has a simple network structure which is easy to be trained. However, contrary to our model: 1) The steps of RNNbased model is which is a quite large number and makes the structure of the model to be very complex when it is training, though the layer of the RNN based model is . 2) The other CNN based models are also much more complex than our model.
In addition, compared with the experiment, the training time, energy consumption and emissions of the model will be more in reality. In this experiment the training set only contains the data from customers for days. And the energy consumption and training time of the mainstream models are expected to exceed kWh and h ( years) respectively^{5}^{5}5Of course, if there are enough GPUs, we can perform multiple parameter adjustment experiments at the same time.. However, in real environment the training set should contain hundreds of thousands or even more customers, which will significantly increase the time and energy consumption of the training model. Therefore, training efficient and lowenergy models like our model is significant.
IvB The prediction accuracy of different deep learning models.
It is found that the prediction accuracy is hard to improve only by constructing different deep learning models. Specially, as shown in Table III, the best performance ones of classical RNNbased models, RNNCNNbased models, and CNNbased models have little difference in the accuracy metrics , , and . Usually, the tiny difference is likely to be eliminated by adjusting hyperparameters. Consistent with Table III, as shown in Figure 3 except for encoderdecoder based models and TCN model, the prediction performances of other models are not much different. It means that it is difficult to use current deep learning technology to make a major breakthrough in forecasting accuracy of dayahead individual resident load forecasting.
In order to improve the accuracy of the forecast, we need to pay more attention to obtain the information about personal activities and external information, as the electricity consumption of the household is extremely dependent on the randomness of individual human behaviors and external factors^{6}^{6}6For example, both the business trip of the residents and the change of indoor temperature will cause electricity consumption changes which are difficult to predict only by the historical load curve of the individual resident..
In addition, it is worth noting that there is a large gap between the performance of the encoderdecoder based models and stateoftheart models. This can be explained by the mechanism of the decoder. In the dayahead individual resident load forecasting task, the decoder predicts the value of the current point based on the value of the previous point and the current state of model. Unfortunately, we can not directly obtain the value of the previous point, and the predicted value of the previous point is used to replace the actual value. Therefore, the forecasting errors will be accumulated and amplified.
Finally, it is also worth noting that compared with the previous very shortterm (such as 15minahead) load forecasting work which predict the electricity consumption more accurately, the dayahead load forecasting tends to predict electricity consumption pattern of the customers. For example, as shown in Figure 3, the early peak and the three late peaks of the actual load curve cannot be accurately predicted.
IvC Effect by the number of layer in model
The recent revival of neural networks has benefited from the development of computer hardware that has made neural networks deeper and deeper which is the main cause of the high complexity of deep learning model.
In general, the deeper the neural network is the more precise the prediction is. However, as shown in the Table III, deeper models of LSTM, seq2seq, seq2seqpooling, CNN do not perform better than shallow models. What’s more, layer LSTM, seq2seq and seq2seqpooling models have terrible performance. It means that on the one hand, the most powerful means of deep learning – increasing depth can no longer help improve the accuracy of the model. On the other hand we need to develop new technologies to solve the overfitting problem.
As a conclusion, it is unnecessary to make the model deeper, which leads to a more complex network structure that more training time is needed, more energy is consumed and more is emitted.
V Conclusion
Dayahead individual resident load forecasting is very important to real applications (such as demand response) of smart grid. Deep learning models have became commonly used methods in load forecasting. However, the deep learning models are computationallyhungry, which requires plenty of training time, and results in considerable energy consumption and plenty of emissions. All of the previous load forecasting works only focus on improving prediction accuracy and ignore training efficiency, energy consumption and environmental costs.
To save resources and promote the application of deep learning models, we propose and develop an efficient green CNNbased model LoadCNN. It not only achieves stateoftheart performance but also has huge advantages in training efficiency, energy consumption and environmental costs. The experimental results on public and largescale dataset show that the training time, energy consumption, and emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other stateoftheart models.
In addition, it is found that it is difficult to improve the accuracy by simply adjusting the hyperparameters or structure of deep learning models. In the future, to improve the accuracy, we must obtain more related information (such as human activities).
Acknowledgment
We are very grateful to CER Smart Metering Project  Electricity Customer Behaviour Trial, 20092010 and ISSDA. This work is supported by the Major Program of National Natural Science Foundation of China (Grant No. 61432006).
References
 [1] N. B. of Statistics of China, “Annual data,” http://data.stats.gov.cn/easyquery.htm?cn=C01, accessed June 12, 2019.
 [2] W. Kong, Z. Y. Dong, D. J. Hill, F. Luo, and Y. Xu, “Shortterm residential load forecasting based on resident behaviour learning,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1087–1088, 2017.
 [3] A. Tascikaraoglu and B. M. Sanandaji, “Shortterm residential electric load forecasting: A compressive spatiotemporal approach,” Energy and Buildings, vol. 111, pp. 380–392, 2016.
 [4] K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He, “Shortterm load forecasting with deep residual networks,” IEEE Transactions on Smart Grid, 2018.
 [5] M. Ghofrani, M. Hassanzadeh, M. EtezadiAmoli, and M. S. Fadali, “Smart meter based shortterm load forecasting for residential customers,” in 2011 North American Power Symposium. IEEE, 2011, pp. 1–5.
 [6] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—a novel pooling deep rnn,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2017.
 [7] M. H. Alobaidi, F. Chebana, and M. A. Meguid, “Robust ensemble learning framework for dayahead forecasting of household based energy consumption,” Applied energy, vol. 212, pp. 997–1012, 2018.
 [8] Y. Wang, D. Gan, M. Sun, N. Zhang, Z. Lu, and C. Kang, “Probabilistic individual load forecasting using pinball loss guided lstm,” Applied Energy, vol. 235, pp. 10–20, 2019.
 [9] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Shortterm residential load forecasting based on lstm recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, 2019.
 [10] Y. Peng, Y. Wang, X. Lu, H. Li, D. Shi, Z. Wang, and J. Li, “Shortterm load forecasting at different aggregation levels with predictability analysis,” arXiv preprint arXiv:1903.10679, 2019.
 [11] M. Cai, M. Pipattanasomporn, and S. Rahman, “Dayahead buildinglevel load forecasts using deep learning vs. traditional timeseries techniques,” Applied Energy, vol. 236, pp. 1078–1088, 2019.
 [12] Y. Wang, M. Liu, Z. Bao, and S. Zhang, “Shortterm load forecasting with multisource data using gated recurrent unit neural networks,” Energies, vol. 11, no. 5, p. 1138, 2018.
 [13] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” arXiv preprint arXiv:1906.02243, 2019.
 [14] X.X. Zhou, W.F. Zeng, H. Chi, C. Luo, C. Liu, J. Zhan, S.M. He, and Z. Zhang, “pdeep: Predicting ms/ms spectra of peptides with deep learning,” Analytical chemistry, vol. 89, no. 23, pp. 12 690–12 697, 2017.
 [15] Y. Chen, C. Hu, B. Hu, L. Hu, H. Yu, and C. Miao, “Inferring cognitive wellness from motor patterns,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 12, pp. 2340–2353, 2018.
 [16] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.A. Muller, “Deep learning for time series classification: a review,” Data Mining and Knowledge Discovery, pp. 1–47, 2019.

[17]
A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,”
Computational intelligence and neuroscience, vol. 2018, 2018.  [18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [19] C. E. Smart, “Metering customer behaviour trials (cbt) findings report,” 2011.
 [20] Google, “Tensorflow,” https://tensorflow.google.cn/versions/r1.10/api_docs/python/tf, accessed June 21, 2019.

[21]
E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum,
“Linguisticallyinformed selfattention for semantic role labeling,” in
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
, 2018, pp. 5027–5038.  [22] V. Campos Camunez, B. Jou, X. Giró Nieto, J. Torres Viñals, and S.F. Chang, “Skip rnn: learning to skip state updates in recurrent neural networks,” in Sixth International Conference on Learning Representations: Monday April 30Thursday May 03, 2018, Vancouver Convention Center, Vancouver:[proceedings], 2018, pp. 1–17.

[23]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in
ThirtyFirst AAAI Conference on Artificial Intelligence, 2017.  [24] C. Tian, J. Ma, C. Zhang, and P. Zhan, “A deep neural network model for shortterm load forecast based on long shortterm memory network and convolutional neural network,” Energies, vol. 11, no. 12, p. 3493, 2018.
 [25] J. Kim, J. Moon, E. Hwang, and P. Kang, “Recurrent inception convolution neural network for multi shortterm load forecasting,” Energy and Buildings, vol. 194, pp. 328–341, 2019.
 [26] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
 [27] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018.
 [28] M. Voß, C. BenderSaebelkampf, and S. Albayrak, “Residential shortterm load forecasting using convolutional neural networks,” in 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). IEEE, 2018, pp. 1–6.

[29]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2016, pp. 770–778.
Comments
There are no comments yet.