Wind power has become a significant renewable resource that can be developed and utilized on a large scale. With the mass production of equipment, wind power has turned to be the fastest growing renewable energy in the world. By 2017, the worldwide wind power installed capacity has reached 539 GW, and 52GW was added in 2017 (REN21, 2018), thus making wind power expected to be one of the major power sources in the 21st century. However, due to the influence of wind speed and direction, randomness and volatility of wind turbines can not be avoided, bringing severe challenges to the safety and stability of the operation of power systems. Accurate wind power prediction can enhance the controllability of wind power, ensure the stable operation of the power grid, and promote the ability of the grid to accept wind power.
At present, scholars have done a lot of related researches, including physical methods, statistical methods and machine learning methods. Among them, machine learning methods, including support vector machine regression (SVR)(Chen & Yu, 2014)
, k-nearest neighbor regression (kNN)(Becker & Thrän, 2017)2018) are used to model wind speed time series or power time series to achieve prediction. Machine learning methods simplify the wind power forecasting problem, but it is difficult to improve the accuracy rate in recent years.
We think that the wind is temporal and spatial correlation process, however, the time series can only express the information at the time level, but say nothing at the space level, let alone the spatio-temporal process of air flow, thus fundamentally standing in the way of the progress of wind power prediction. Therefore, finding the features that can better express the state of the wind farm is the key to breaking through the bottleneck of accuracy.
Such being the case, this paper put forward a new feature that can express the spatio-temporal process of air flow, called spatio-temporal feature (STF). The scene time series over a period of time is a multi-channel image, in which each scene is a sample of the true distribution of physical data in space, expressing spatial-related information. The scene sequence represents the change of wind farm state over time, expressing time information, so the multi-channel image is called spatio-temporal feature. Compared with wind speed or power series, STF implies factors such as wind speed, wind direction and air density, which greatly expands the ability to express wind-related information and lays a foundation for breaking through the bottleneck of wind power prediction accuracy.
Based on the STF, the spatio-temporal process of the wind farm is simulated and predicted by using the deep convolutional network, which has achieved good effects. The experimental results of 592 wind turbines in a certain area show that, the method proposed by us is better than exist stage-of-the-art series modeling methods, for the reason that the of the proposed method decreases by an average of 26.69% and decreases by 49.83% at most, and the time for training models is shortened by more than 150 times.
The innovations of this paper are as follows:
The spatio-temporal feature in the form of the multichannel image is constructed by embedding the grid space of the wind motor, which fully expresses the spatio-temporal variation process of the air flow and can perfectly combine with the most advanced theory of deep learning at present.
Two kinds of deep convolutional network models that are suitable to use the spatio-temporal feature for wind power prediction are proposed, which can predict the wind power of a large number of turbines in parallel. And the accuracy and time cost of the prediction have been greatly optimized.
2 Related Work
2.1 Machine Learning Methods in WPP
The machine learning method performs well in short term prediction. By means of the regression model or neural network, researchers map the time series to the wind power of the future moment, so as to make the prediction. The commonly used methods are SVR (Chen & Yu, 2014), kNN (Becker & Thrän, 2017)
, Multilayer Perceptron Network (MLP)(Deo et al., 2018; Marvuglia & Messineo, 2012) and Long and Short Term Memory Neural Network ( LSTM ) (Qu et al., 2016), etc., among which SVR and kNN are the representatives.
SVR has a perfect mathematical foundation in theory and performs best among numerous regressions. It realizes regression by finding a hyperplane to make all the data closest to the plane. This process can be abstracted as that, when equation (2) is satisfied, the parameter should be found to make the value of equation (1) minimized. In these two equations, and are empirical parameters, , and are called relaxation factors, and and represent hyperplanes (Treiber et al., 2016).
It has been proved in many literatures (Treiber et al., 2016; Chen & Yu, 2014; Khosravi et al., 2018; Díaz et al., 2017) that SVR is one of the best methods in the field of wind power prediction currently.
KNN is the most simply equipped machine learning model based on similarity metric and it still has a good performance in practice. As the similarity between the two vectors is negative correlated with the distance between them (such as Euclidean distance, Manhattan distance, etc.), the similarity can be represented by distance. The vectors in a set with the minimum distance from the target vector can be called nearest neighbors of . If denotes the subscript of nearest neighbors of in the training set, then the prediction result of kNN model for is generated by equation (3) , and the function can use arithmetic average method, weighted average method or other methods that are more complex .
Based on the algorithms such as k-d tree, a kNN model can be trained very quickly.
In recent years, there have been some new ideas in this research field. For example, in literatures (Tascikaraoglu et al., 2016; Lixin & Lei, 2018; Saroha & Aggarwal, 2018), wavelet transform was used to decompose the power series to form multiple new sub-series that would be predicted in turn. And then the results were combined. This method needs to build a model for each sub-series, thereby leading to much higher costs. The researchers modeled the prediction error to improve the prediction effect by error analysis(Wang et al., 2018; Giorgi et al., 2011). But the error is produced by the specific prediction model, which has limits, difficult to be applied to the production. In addition, the process of error analysis increases the computational costs. Although using ensemble learning to predict could improve the accuracy (Zhao et al., 2016; Pinson et al., 2013; Wang et al., 2017; Heinermann & Kramer, 2016), many models working at the same time also consume computational resources substantially. Moreover, in the literature (Wang et al., 2017), the series data of at the length of pq were filled in the grid of p
q in order to achieve a two-dimensional image, following which the convolutional neural network was utilized for the prediction. But the constructed image had no explicit physical meaning. And the required time series were far too long to add the computational costs.
In sum, all of the above approaches used the series data for modeling in essence, and achieved a higher accuracy via the complicated models. However, their computational cost was largely increased and their models could not reflect the spatio-temporal variation of air.
2.2 Convolutional Neural Network
This chapter introduces the convolutional neural network (CNN), which lays the foundation for the third chapter to introduce the method proposed in this paper. At present, CNN is the most successful method in deep learning that has been widely used in auxiliary medical treatment, speech recognition, intelligent city and automatic driving system. Besides, CNN can speed up computing by GPU. With the rapid development of hardware in recent years, the computing ability of computers has been greatly improved, thus leading the CNN model to a significant progress in many fields.
The central operation of CNN is the convolution. As both the input and output of convolutions are multichannel images, these images are usually called as feature maps. There are abundant types of CNN models, but as a whole they can be divided into two basic types. The first is the coding machine-decoder model, whose core operations are the convolution, pooling and deconvolution. The convolution process is to extract deep features, pooling is to narrow the size of images, and the deconvolution aims at enlarging the image size by up-sampling. FCN network(Long et al., 2015) is typical in this method. The second type is a convolutional network with a fully connected layer, whose the core operations include the convolution, pooling and full connection. In this type of model, the convolution and pooling process produce deep features, while full connection maps deep features to predictive values. On account of the excellent expression of full connection, the model, VGGNet (Simonyan & Zisserman, 2014) in particular, can always fit a very complex nonlinear relation.
3 Proposed Method
The information related to wind power such as turbine’s output power and wind speed can be strongly combined with convolutional networks. On one hand, convolutional networks are quite suitable to deal with grid data structure, which can automatically extract features at different layers and realize the end-to-end learning. On the other hand, the wind turbine itself is in the grid space whose distribution is easily modeled as a planar grid structure. But the current researches have little experience in combining the both.
In this chapter, the spatio-temporal feature (STF) and its basis, scenes, are introduced, and then two kinds of convolutional networks models based on STF are put forward, which serve as two basic convolutional networks mentioned in the second chapter. In this paper, we use these two models in order to show that STF can be combined with various convolutional networks in practical utilization.
The rest of this chapter will further elaborated on the above contents.
3.1 Scene and STF
Feature extraction has always been a hot topic in wind power prediction. In this paper, the feature extracted only from the data of the target turbine itself is called “single-feature (SF)”, and the feature extracted from the data of the target turbine and several adjacent generators is called “local-feature (LF)”. Basically, the local-feature is an extended form of the single-feature. When the local-feature selects a distance threshold of 0 for adjacent turbines, it degenerates into single-feature.
Most of the features used in existing works are single-features, and some researchers have also studied local-features. For example, in the literature (Treiber et al., 2016), the local-feature is generated by connecting the single-feature of each turbine. The feature extracted in this way contains more information, but these information is not efficient, covering only the concept of temporal level but in devoid of the spatial dimension.
In order to describe the spatial distribution of wind in a certain area at a certain time, the concept of scene is set out in this paper. We map the output electric power of the wind turbines to the plane according to the geographical coordinates of the turbines at a certain time, to form a two-dimensional image called the scene. Mapping the real coordinates to the plane is the main problem while constructing a scene. The most direct solution is to scale down the real geographic coordinates and then to draw them onto the plane, as shown in figure 1.
This method can successfully show the spatial position, but the size of the constructed image is relatively large containing only sparse effective pixels, which is not conducive to calculation. To solve this problem, this paper proposes a method to embed turbines into grids as small an area as possible, which is called grid space embedding method. In this algorithm, the longitude and latitude coordinates are firstly processed by ridding unbalance and discretization, in order to determine the shape of the scene and then generate the grids. After grid generation, each turbine is mapped to the corresponding grid in the order sorted by its horizontal and vertical coordinates. More details are shown in algorithm 1. The output is the mapping matrix of turbines to grid points, each position serving as the serial number of the turbine. When the vacant position is filled with to get the matrix , the output of turbines at a certain time is filled into the matrix according to the position specified by , and then the scene corresponding to the time can be obtained, which is shown in algorithm 2.
The proposed embedding algorithm uses the grids as small as possible to avoid invalid pixels, and the constructed scene is suitable for convolutional computation.
The scene represents the spatial distribution of wind power at a certain time. And connecting several continuous scenes in series can convey the process of spatial state changing with time. Although the air motion is complicated, it still shows certain regularity on the whole, and the scene series can reflect this regularity to some extent. In this paper, the multichannel image got by the scenes arranged in time series is named as the spatio-temporal feature (STF).
Each channel of STF independently represents spatial information, and the combination of the multichannel sorting represents temporal information. It is a kind of global-feature for it can synthetically deliver the information in a large geographical area and a long time range. In fact, each channel of the STF can also be used to represent different types of information, such as wind power output, wind speed, temperature and so on. The STF, which combines many kinds of data, is called MSTF. The STF can be processed by deep convolutional neural network. Convolution neural network is the most mature theory of deep learning at present, which, given perfect tools and frameworks, can give full play to the advantages of new technologies such as GPU acceleration.
3.2 E2E Model
The first kind of convolutional neural network model for wind power prediction based on STF is introduced in this section, which is called E2E model, using the idea of autoEncoder(Vincent et al., 2008).
After received, the input image will be handled in two stages. The first stage is down-sampling, that is, the coding stage, in which the deep features are extracted step by step and the image size is shrunk by means of multiple nested convolution layers and a pooling layer. The second is up-sampling, that is, the decoding stage, which mainly includes deconvolutional layers. By deconvolution, the size of the feature map is initially increased, and finally the output of the same size as the input image is obtained. As a result, the pixels of the input image and the pixels of the output image can be corresponded one-to-one to realize the end-to-end mapping.
In the down-sampling stage, under the guidance of the idea of ”short circuit” in DenseNet, the outputs of multiple prepositive convolutional layers are connected in series, and then input to the next convolutional layer to preserve the spatial information of the original input image. Since the major task of this stage is to fully extract features, the number of channels in the feature image increases rapidly. The main task of the upper sampling stage is the fusion of features in order to produce the output. In this stage, the outputs of each convolutional layer are no longer connected in series, and the output of each deconvolution reduces the channels. In this way the single channel image is finally output. The structure of the E2E model is shown in Figure 2.
3.3 FC-CNN Model
The second model is a convolutional neural network containing a fully connected layer, called FC-CNN. After receiving the input image, the model also performs the operations of two stages.The first stage is similar to the down-sampling stage of E2E model, but the deeper layers are in demand in FC-CNN and the size of feature map is smaller. The second stage is the fully connected network. The deep features are mapped to the output of each turbine by fitting the complex function relationship with the fully connected layer. The output vector length of the last full connected layer, equal to the number of pixels in the input image, is reshaped to be two dimensional, and mapped to the pixels of the input image one by one. The down-sampling process of the model also incorporates the idea of Dense Net, and the model structure is shown in figure 3.
4 Experiment and Analysis
4.1 Data Sets and Evaluation Criteria
The data set used here is the wind data set of the NREL 111https://www.nrel.gov which contains the output values of every 10 minutes of wind turbines in the United States from 2004 to 2006. To validate our method, an area with the longitude range from 105.00W to 105.34W and a latitude range from 41.40N to 41.90N is selected, which is located in the middle of the United States, where wind turbines are densely distributed reaching a number of 592. And we make the prediction about the wind power output of the wind turbine after 30 minutes based on the above data.
Accuracy is the most important factor to measure the effect of wind power prediction, and the main indexes of evaluating accuracy are mean square error () and root mean square error (), being the square root of . So in this paper, is chosen as the standard of evaluation, whose calculation process is shown in equation(4), in which represent the series of true values, represent the series of the predicted values, and represents the length of the series.
4.2 Scene Display
As shown in Figure 4, there are 8 scenes sequenced in time series, with the darker regions in each scene representing the larger value. This figure is used to show the spatial information expressed by scene and the spatio-temporal information expressed by STF.
It can be seen from the figure that the air flow in this region obviously shows regularity during this period (70min). Firstly, the output power of the wind turbine is strongly correlated with the spatial position. Secondly, as time goes by, a visible displacement is shown between the scenes. So it can be inferred that the west wind has crossed the border during this period, thus expanding the affected areas. These laws are the basis of prediction using machine learning methods. And the results show the advantage of STF, that is, being able to express the spatio-temporal variation process of wind. The traditional single-feature can be visualized into a curve, but it is difficult to find obvious regulation no matter for human eyes or computer algorithms, thus having no access to a better prediction accuracy.
4.3 Experimental Results and Comparison
The methods based on LF such as SVR have reached the level of stage-of-the-art in wind power prediction. In order to prove the validity of the method proposed in this paper, SVR which is the most accurate method for prediction is compared with kNN, the fastest training method. In the experiment, kNN uses SF training model when SVR uses LF training model. The experimental results are shown in tables 1 and figure 7.
In the experiment, the s of 592 wind turbines in each method are obtained firstly. table 1 compares the effects of the methods according to the maximum, minimum and average values of these values. The average values of the s in the two methods proposed in this paper are 7.91 and 7.78 respectively, and the integration of the two average values can reach the number of 7.61. However, the optimal value of the above standard of the existing methods is 10.05, compared with which that value in our methods is reduced by 24. 28%. Therefore, according to the above numerical results, the two methods proposed in this paper are superior to other methods in prediction accuracy.
Table 1 provides a quantitative comparison of the overall performance of the methods. And figure 1 further shows the distribution of each of these methods corresponding to . The columnar section in each subgraph corresponds to the distribution of , in which the curve illustrates the variation of probability density, the horizontal scale represents the value of , and the ordinate represents the corresponding probability density (). The first five images show the effect of each method, and the last image compares all the results, to find that the of FC-CNN and E2E model are distributed in the region with the smaller values. Therefore, on the whole, the proposed methods outperform the SVR and kNN.
The above results have proved the advantages of the proposed method. In figure 5 and figure 6, wind turbines are analyzed in turn, to quantitatively compare the results of optimization. In the figures, denotes the models, LF+SVR and LF+kNN, used for comparison. The effect of the method using SF is inferior to that of the method using LC, so it is no longer comparison. The values got from equation 5 reflect the reduced ratio of of FC-CNN compared with M. And figure 5 and figure 6 are the probability density curves obtained by fitting these values.
In figure 5 and figure 6, the area of the region whose horizontal coordinate is less than 0 is almost none, which means that the prediction effect of FC-CNN on almost all wind turbines is optimized compared with the above two methods. According to the statistics, compared with LC+SVR, its had an average reduction of 24.10%, and maximally decreased by 45.55%. And compared with LC+kNN, its decreased by 30.10%, and highest by 45.55%.
Figure 8 shows the predicted value curves of each method on a randomly selected turbine. It can be seen from the figure that the predicted results of the model using STF are more stable, whose stability is even better than that of the true value. As a matter of fact, wind is a natural phenomenon, but the conversion process from wind to wind power output is complex, with many interference factors related to the characteristics of the wind turbine itself. In order to further analyze the experimental results, the wind power prediction is divided into two stages. The first stage is to predict the information such as wind speed, and another stage is to convert the wind state information from the prediction to wind power output. And it has been believed in this paper that the prediction errors mainly come into existence during the second stage. To verify this idea, this paper uses the proposed methods to separately predict wind speed and wind power. The typical results are shown in figure 9, in which the true and predicted value of wind power and wind speed at 8 moments are visualized. Obviously, the predicted value of wind power, the true value and the predicted value of wind speed are relatively smooth, but the true value of wind power is far from smooth. This shows that the output power rates of two wind turbines with similar wind speed are different even if the turbines are quite close to each other, which fully indicates that the conversion from wind speed to wind power is related to the characteristics of the wind turbine itself. In addition, at the same moment and in the same region, the of wind speed and wind power predicted with the same method are 0.92 and 7.17, respectively. We can see that the of wind speed is much lower than that of wind power. It further shows the wind speed is easier to predict, when the wind power is difficult to predict due to the wind turbine’ specific features. The STF presented in this paper can express wind-related information in a large geographical area and a long time span. Convolutional network can be used to predict the overall variation of the wind in the region and can reduce the effect of ”noise” caused by the specific features of the wind turbine. So in the result shown in Figure 8, the predicted value is more stable than the true value.
4.4 Performance Analysis
The two convolutional networks proposed in this paper can achieve the end-to-end prediction. And since each pixel point at the output end corresponds to a turbine, the prediction of a scene is actually the prediction of all turbines in parallel. Meanwhile, the convolutional network can make full use of GPU acceleration, so the training time has been greatly shorten. The comparative effect of the time for training the model is shown in the last line of table 1. It can be seen that, overall, the training time is qualitatively optimized, which has been shorten by a factor of more than 150, in contrast with that of SVR.
4.5 MSTF Experiment
As described in Chapter 3, the STF carrying multiple types of information is called MSTF. Using MSTF can further improve the effect of wind power prediction. This paper uses simple experiments to prove this view but will not discuss it in detail. As shown in table 2, the of MSTF+FC-CNN compared with that of LF+SVR was reduced by 26.69% on average and 49.83% at most. Compared with the of LF+kNN, it decreased by 32.49% on average and 56.63% at most. The effect is also better than that of using STF. The average of E2E model and FC-CNN model, both of which use MSTF, in comparison with the model using STF are respectively reduced by 7.08% and 6.81%.
This paper proposes a global feature STF for wind power prediction, and uses convolutional network to predict wind power. Compared with the existing methods, the proposed method greatly optimizes the prediction accuracy and the time cost for training models. In addition, this paper also proposes an approach to fuse various types of data by means of MSTF, which is then proved to be effective in the experiment.
In fact, STF is modeling the spatio-temporal state of wind farm, in which wind turbines play the role of information collectors. The denser the wind turbines are, the more completed the information collected is , so STF is quite suitable to describe the state of a large wind farm. It is worth noting that the STF uses plane to represent the spatial state, which will lose the terrain information, so STF is more suitable for the flat area. In the past several years, offshore wind power has grown rapidly. Thanks to the large scale and flat area of offshore wind farms, STF is naturally ideal for modeling and forecasting offshore wind farms. In future work, this paper will focus on offshore wind farms as the main area of application and further develop the following researches.
The way of MSTF’s fusion of multiple types of data will be studied in order to continuously improve the accuracy of prediction.
In this paper, two kinds of simple models of convolutional network are constructed to make a prediction, and good prediction results have been obtained. In fact, convolutional networks have developed rapidly in recent years, and the next step will be to introduce more advanced models.
Becker & Thrän (2017)
Becker, Raik and Thrän, Daniela.
Completion of wind turbine data sets for wind integration studies applying random forests and k-nearest neighbors.Applied Energy, 208, 2017.
Chen & Yu (2014)
Chen, Kuilin and Yu, Jie.
Short-term wind speed prediction using an unscented kalman filter based state-space support vector regression approach.Applied Energy, 113(6):690–705, 2014.
- Deo et al. (2018) Deo, Ravinesh C., Ghorbani, Mohammad Ali, Samadianfrad, Saeed, Maraseni, Tek, Bilgili, Mehmet, and Biazar, Mustafa. Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renewable Energy, 116, 2018.
Díaz et al. (2017)
Díaz, Santiago, Carta, José A., and Matías, José M.
Performance assessment of five mcp models proposed for the estimation of long-term wind turbine power outputs at a target site using three machine learning techniques.Applied Energy, 209, 2017.
- Giorgi et al. (2011) Giorgi, Maria Grazia De, Ficarella, Antonio, and Tarantino, Marco. Error analysis of short term wind power prediction models. Applied Energy, 88(4):1298–1311, 2011.
- Heinermann & Kramer (2016) Heinermann, Justin and Kramer, Oliver. Machine learning ensembles for wind power prediction. Renewable Energy, 89:671–679, 2016.
- Khosravi et al. (2018) Khosravi, A., Machado, L., Nunes, R. O., Energy, Applied, and Yan, J. Time-series prediction of wind speed using machine learning algorithms: A case study osorio wind farm, brazil. Applied Energy, 2018.
- Lixin & Lei (2018) Lixin, M. A. and Lei, W. U. Prediction model of wind power based on lifting wavelet transform. Electric Power Science and Engineering, 2018.
- Long et al. (2015) Long, Jonathan, Shelhamer, Evan, and Darrell, Trevor. Fully convolutional networks for semantic segmentation. In
- Marvuglia & Messineo (2012) Marvuglia, Antonino and Messineo, Antonio. Monitoring of wind farms’ power curves using machine learning techniques. Applied Energy, 98(98):574–583, 2012.
- Pinson et al. (2013) Pinson, P., Nielsen, H. Aa., Madsen, H., and Kariniotakis, G. Skill forecasting from ensemble predictions of wind power. Applied Energy, 86(7):1326–1334, 2013.
Qu et al. (2016)
Qu, Xiaoyun, Kang, Xiaoning, Zhang, Chao, Jiang, Shuai, and Ma, Xiuda.
Short-term prediction of wind power based on deep long short-term memory.In Power and Energy Engineering Conference, pp. 1148–1152, 2016.
- REN21 (2018) REN21. Renewables global status report. http://www.ren21.net/status-of-renewables/global-status-report/, 2018.
- Saroha & Aggarwal (2018) Saroha, Sumit and Aggarwal, S. K. Wind power forecasting using wavelet transforms and neural networks with tapped delay. Csee Journal of Power and Energy Systems, 4(2):197–209, 2018.
- Simonyan & Zisserman (2014) Simonyan, Karen and Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. Computer Science, 2014.
- Tascikaraoglu et al. (2016) Tascikaraoglu, Akin, Sanandaji, Borhan M., Poolla, Kameshwar, and Varaiya, Pravin. Exploiting sparsity of interconnections in spatio-temporal wind speed forecasting using wavelet transform. Applied Energy, 165:735–747, 2016.
- Treiber et al. (2016) Treiber, Nils André, Heinermann, Justin, and Kramer, Oliver. Wind power prediction with machine learning. Computational Sustainability, pp. 13–29, 2016.
Vincent et al. (2008)
Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua, and Manzagol,
Extracting and composing robust features with denoising autoencoders.In International Conference on Machine Learning, pp. 1096–1103, 2008.
- Wang et al. (2017) Wang, Huai Zhi, Li, Gang Qiang, Wang, Gui Bing, Peng, Jian Chun, Jiang, Hui, and Liu, Yi Tao. Deep learning based ensemble approach for probabilistic wind power forecasting. Applied Energy, 188:56–70, 2017.
- Wang et al. (2018) Wang, Zhiwen, Shen, Chen, and Liu, Feng. A conditional model of wind power forecast errors and its application in scenario generation. Applied Energy, 212, 2018.
- Zhao et al. (2016) Zhao, Jing, Guo, Zhen Hai, Su, Zhong Yue, Zhao, Zhi Yuan, Xiao, Xia, and Liu, Feng. An improved multi-step forecasting model based on wrf ensembles and creative fuzzy systems for wind speed. Applied Energy, 162(19):808–826, 2016.