Deep Spatio-Temporal Wind Power Forecasting

09/29/2021 ∙ by Jiangyuan Li, et al. ∙ 0

Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model forecasts wind power generated by a wind turbine using its spatial location relative to other turbines and historical wind speed data. In this way, we effectively integrate spatial dependency and temporal trends to make turbine-specific predictions. The advantages of our method over existing work can be summarized as 1) it directly predicts wind power based on historical wind speed, without the need for prediction of wind speed first, and then using a transformation; 2) it can effectively capture long-term dependency 3) our model is more scalable and efficient compared with other deep learning based methods. We demonstrate the efficacy of our model on the benchmark real-world datasets.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Deep Spatio Temporal Wind Power Forecasting

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Wind energy has become an essential source of energy resources worldwide due to being pollution-free, and its wide availability [20]. However, its strong volatility could cause substantial power fluctuation and affect the overall operation of the regional power grid. Insufficiently accurate wind forecasts may bring hidden dangers to the safe and stable operation of the entire power system. Therefore, an effective wind power forecasting method is necessary to find the most economical solution for the operation of the power grid. This can help the power dispatching department to organize the generation plan optimally and consequently improves the reliability and security of the power grid [3, 1].

Wind power forecasting is generally viewed as a complex task due to the chaotic and stochastic features of wind speed time series. Existing methods on wind power forecasting fall into four main categories. 1) Persistence methods assume the wind data remain unchanged in a short time window. 2) Physical methods formulate the problem of wind power based on numerical weather prediction (NWP) usually use weather prediction data such as temperature, pressure, surface roughness, and obstacles. NWP build models by complete hydrodynamic and thermodynamic equation sets, which usually have huge computational burdens but with limited temporal and spatial resolution [18]. 3) Statistical methods are based on probabilistic modeling on historical data, such as ARIMA-based approaches for temporal features [7]

and Kriging interpolation method for spatial correlations. 4) Deep learning based methods learn the intricate mapping between the inputs and outputs from massive historical data.

The recently developed deep learning approach shows a significant improvement compare to the classical baseline. However, most of them still suffer from several drawbacks: 1) They lack a proper design for temporal features and spatial correlations. 2) Many of them are still relying on power curve transformation to forecast wind power output. Although this approach could simplify the wind power forecasting problem to wind speed time series analysis, the power curve-fitting leads to considerable errors, which also leaves turbine identity aside. 3) Capturing long-term dependency in current methods requires having a large neural network with many parameters, which is not data-efficient and may not be a necessity. Furthermore, enlarging the network size may lead to severe overfitting issues and cause difficulties in dealing with strong seasonality due to the nature of the wind.

To overcome these problems, we develop a deep wind power forecasting model. Our model adopts an encoder-decoder architecture with GRU [4]

as the recurrent unit, which can capture the temporal feature with long-term dependency. With the extra multi-layer perceptron (MLP) attached in the decoder, the model could directly produce forecasts for either wind speed or wind power. Moreover, we utilize spatial information of turbines and the correlation among neighbor turbines to provide a more robust and accurate forecast. We also learn an embedding vector to produce turbine-specific forecasts, which account for each turbine’s quality and unique environmental condition.

2 Problem Formulation

The objective of wind power forecasting is to capture the relations between the historical wind speed data of surrounding wind turbines and the future wind power output of the target wind turbine. Suppose there exists a set of wind turbines in a wind farm, each wind turbine is determined by its locations . From the historical data, the wind speed at time is denoted by and the recorded wind power output is denoted by . At time , for our target turbine , the forecast is predicted based on the previous wind speed data, i.e.,


where denotes the forecasting horizon, is an implicit function to extract hidden features for forecasting, is for making forecasts, which takes current wind power record and hidden states as inputs. and represents the parameters in our model. Note that can be viewed as a multi-dimensional time series. The proper choice of dimension is determined by the spatial correlations. For each turbine , we build the neighbor and denoted as . Many other features besides wind speed can also be incorporated in this multi-dimensional time series and will be discussed later. For model parameters , we separate one set of parameters for each turbine’s identity to make turbine-specific forecasts. Thus, the forecasting framework takes the following form,


Note that the parameters , and are trained together, and as turbine identity only serves for our forecasting purpose.

3 Proposed Methodology

In this section, the extraction of deep spatio-temporal features of the proposed model is explained. First, the temporal feature is extracted by gated recurrent unit (GRU), and worked in an encoder-decoder manner which transforms historical wind speed to wind power forecasts. Second, the temporal features are enriched by spatial correlations using graph construction. Last, turbine identity embedding and relevant time features are included in the model to produce turbine-specific forecasts and enhance the model performance.

3.1 Temporal Features

Assume the wind speed values corresponding to time steps are available at time step . We consider the -length time window to capture the temporal wind speed data features. For wind turbine , we aim to extract the corresponding hidden features , and use the hidden features to make multiple step forecasts up to the given forecasting horizon. In the proposed framework, an encoder-decoder GRU network is applied to learn deep temporal features . The encoder and decoder are both GRU networks and take inputs sequentially, which captures the sequential nature of the wind time series.

The GRU network is a variation of LSTM network, which has less parameters and can be trained faster. The GRU block has several special multiplicative computational units. The reset gate controls the output flow of the previous hidden states into the subsequent memory block. The update gate controls the balance between previous hidden states and the current candidate hidden states. GRU utilizes hidden states efficiently instead of using an extra cell state to account for long-term dependency. We randomly pick several wind speed time series and illustrated that the wind speed time series doesn’t exhibit extremely long dependencies, see Figure 1. Less parameters in GRU not only benefit the training speed, but also reduce the effect of potential overfitting issues. More discussions about LSTM and GRU are shown in experiments.

Figure 1: Autocorrelation plots of randomly sampled time series. The dashed and solid horizontal lines indicate to the and confidence intervals for the correlation values around zero.

3.2 K-NN Graph

Analysts have noticed that valuable information may be revealed by considering spatial measurements in a local region, as wind characteristics at a site may resemble those at neighboring sites [5]

. Deep learning for spatio-temporal data has been widely applied in various spatio-temporal data mining tasks such as predictive learning, representation learning and anomaly detection


. There are various types of spatio-temporal data that differs in the way of data collection and representation in different applications. For example, convolutional neural network (CNN) is often used in traffic prediction problem to process their image-like data for spatial relationships

[21]. Attention mechanism is also applied in wind power forecasting across wind farms, in which the geographical coordinates of wind farms can not provide clear information for wind power forecasting patterns [9].

Our model tries to make wind power forecasts at a turbine level. Turbines in a local region may share similar air density, air pressure and humidity. Including turbines with similar conditions is beneficial to making forecasts. Distance of turbines provides a natural metric to quantify similarities. To incorporate the spatial dependency, we apply the k-nearest neighbors algorithm (k-NN) on the geographical coordinates of turbines. By incorporating neighbors of the target turbine, the input of encoder would be k-dimensional corresponding to a multidimensional time series, while the decoder remains to generate wind power forecasts for the target turbine. Let be the index set of k-nearest neighbours of turbine , and be the wind speed of these turbines at time , ordered by distance. The objective function would be


Here, we simply denote all trainable parameters as . When the spatial dependency is unclear, self-attention mechanism [9] is often used, and it provides a weighted one-dimensional time series as input. This approach explores unknown relationships across wind farms, but possibly is very restrictive. In our model, distance is better used to quantify spatial dependency and we enrich the single time series to k-dimensional, which also provides more flexibility to encode wind speed information in encoder state .

3.3 Turbine Embedding

In the wind industry, a power curve [13] is often used to assessing a turbine’s energy production efficiency, which is relation between the power output and wind speed at the same time. Classical approaches [8, 2, 15]

to get forecasting on wind power rely on this power curve to transform the predicted wind speed to the power output, where the prediction on wind speed is obtained from a time series or spatial temporal model. There is a large discrepancy of power curves across turbines. The power curve is estimated for each turbine, therefore the forecasting would be turbine-specific. This approach suggests wind speed is the main feature when predicting wind power. The problem is that this extra curve fitting step might add error for wind power forecasting. That also motivates us to use a Seq2Seq approach.

To accomplish the turbine-specific forecasting while allowing the model to share parameters across turbines, we need to give the model information about which turbine the data comes from via input. One hot-encoded vector is a traditional way to identify turbines. But this representation is large, sparse and inefficient without any semantic information. To overcome this inefficiency, we represent each turbine with a latent vector. There is a wide range of topics to use latent vectors or embeddings to represent identities in the model, such as social network analysis


and natural language processing

[14]. The embedding vector is usually from feature engineering, or pre-trained embeddings for other tasks. It can also be learned by adding an embedding layer to the model. We follow this learned embedding approach and denote the embedding vector as .

Figure 2: The graphical illustration of the proposed model

The final model is shown in Figure 2. The embedding layer reads


The embedding matrix is denoted as . We denote the dimension of embedded vector as , while is the number of turbines and is the one-hot encoded vector of turbine . The similarity of turbines is also revealed in the embeddings, which helps capture the spatial dependency.

3.4 Feature Enrichment

Wind speed is known to have seasonal changes. It also shows a daily pattern, which is closely related to sunrise and sunset. In classical approaches, this periodic pattern is often ignored [7, 17]. Our model can be easily extended to include these time features. Let us denote be the time features, such as hour, month and season at time . We append these time features to the input of each recurrent unit in our model, which further increases the dimension of the input time series. This makes the latent features informative to time, and also helps reduce the influence of change points when dealing with temporal features.

4 Experiments

Time (h) 1 2 3 4 5 6
PER .128 .163 .189 .212 .229 .241
CRS .125 .159 .185 .202 .215 .223
MLP .131 .162 .184 .201 .214 .223
RNN .123 .155 .178 .195 .209 .220
LSTM .123 .157 .180 .198 .255 .262
PSTN .125 .165 .177 .198 .217 .233
DL-STF .130 .161 .183 .196 .208 .218
STAN .132 .154 .173 .190 .203 .215
Ours .128 .155 .174 .189 .202 .212
7 8 9 10 11 12
PER .254 .268 .285 .295 .299 .296
CRS .236 .241 .250 .255 .260 .262
MLP .230 .236 .253 .257 .259 .263
RNN .230 .237 .243 .248 .259 .263
LSTM .263 .269 .271 .270 .272 .273
PSTN .241 .236 .245 .251 .254 .253
DL-STF .228 .236 .243 .247 .255 .256
STAN .223 .231 .238 .244 .249 .253
Ours .220 .227 .233 .237 .242 .245
Table 1: MAE for wind power forecasting for -hour ahead, .

In this section, we illustrate the performance of our model on two real world datasets. The first one is collected at an onshore wind farm in the United States. One year of turbine-specific hourly wind speed and power values are measured on each of the 200 turbines. This dataset is provided in [5]. The other one is from Wind Integration National Dataset (WIND), provided by the National Renewable Energy Laboratory (NREL) [6]. Base on WIND, a wind turbine array within a wind farm in Wyoming is selected.

We compare our model with several methods ranging from classical models and the most recent deep learning based models. The persistent model (PER) provides a baseline, and calibrated regime-switching model (CRS) achieved the best performance among several traditional methods [2]

. Three base deep learning models are included in comparison, multi-layer perceptron (MLP), recurrent neural net and long-short term memory (LSTM)

[12]. We also compare our model with several other deep learning models for wind forecasting, PSTN [22], DL-STF [10] and STAN [9]. We use mean absolute error (MAE) and root mean squared error (RMSE) for evaluation purpose.

The result in MAE for wind power forecasting is shown in Table 1. The training period is first 3 months and testing period is the remaining 9 months. Vanilla RNN and LSTM performs the best for the first hour forecasting (better than persistent model), which indicates simple RNN structure is efficient to capture the temporal dependency. Our model obtain the nearly best result for . Starting from , our model beats all the methods, which demonstrate the ability of our method to capture the long-term dependency. The number of parameters in our model is around , while other deep learning methods (PSTN, DL-STF and STAN) need at least parameters. This indicates that proper design of deep learning methods is able to capture the spatio-temporal dependency in wind data, and model complexity doesn’t need to be unnecessarily large (Table 2).

#Param 2.20M 17.59M 225.23M 22.40K
Table 2: Number of parameters

Though wind power data is not available in WIND, we train our model with wind speed to make speed forecasts and compare it with other models. The result in RMSE is shown in Figure 3. The training period is first 8 months and testing period is set as the remaining 4 months to align with [22]. The error curve of our model is consistently lower than all other models, though our model is not designed for speed forecasting. This shows that our model is efficient to extract features of chaotic and stochastic time series of wind data.

Figure 3: RMSE for wind speed forecasting on NREL dataset.

5 Conclusion

In this work, we proposed a deep spatio-temporal learning approach for wind power forecasting. Our model effectively integrates both spatial dependency and temporal trend by enhancing the single time series to multiple dimensional based on a k-nearest neighbor graph. The embedding of turbine identity enables turbine-specific forecasts. The encoder-decoder structure overpasses the commonly used power curve transformation step in wind power forecasting problem, and improves the forecasting accuracy compared with classical approaches.

For future work, we would investigate the approach with probabilistic modeling to improve our model. This provides an uncertainty quantification about the forecasts, which will increase the interpretability of our model. The difficulty would be the probabilistic modeling of wind power. We plan to utilize the auto-regressive recurrent networks [16] and physical laws to tackle this problem.


  • [1] H. B. Azad, S. Mekhilef, and V. G. Ganapathy (2014)

    Long-term wind speed forecasting and general pattern recognition using neural networks

    IEEE Transactions on Sustainable Energy 5 (2), pp. 546–553. Cited by: §1.
  • [2] A. Aziz Ezzat, M. Jun, and Y. Ding (2019-09) Spatio-temporal short-term wind forecast: a calibrated regime-switching method. Ann. Appl. Stat. 13 (3), pp. 1484–1510. External Links: Document, Link Cited by: §3.3, §4.
  • [3] N. Chen, Z. Qian, I. T. Nabney, and X. Meng (2013) Wind power forecasts using gaussian processes and numerical weather prediction. IEEE Transactions on Power Systems 29 (2), pp. 656–665. Cited by: §1.
  • [4] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2014)

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    arXiv preprint arXiv:1412.3555. Cited by: §1.
  • [5] Y. Ding (2019) Data science for wind energy. CRC Press. Cited by: §3.2, §4.
  • [6] C. Draxl, B. Hodge, A. Clifton, and J. McCaa (2015) Overview and meteorological validation of the wind integration national dataset toolkit. Technical report National Renewable Energy Lab.(NREL), Golden, CO (United States). Cited by: §4.
  • [7] E. Erdem and J. Shi (2011) ARMA based approaches for forecasting the tuple of wind speed and direction. Applied Energy 88 (4), pp. 1405–1414. Cited by: §1, §3.4.
  • [8] A. A. Ezzat, M. Jun, and Y. Ding (2018) Spatio-temporal asymmetry of local wind fields and its impact on short-term wind forecasting. IEEE Transactions on Sustainable Energy 9 (3), pp. 1437–1447. Cited by: §3.3.
  • [9] X. Fu, F. Gao, J. Wu, X. Wei, and F. Duan (2019) Spatiotemporal attention networks for wind power forecasting. In 2019 International Conference on Data Mining Workshops (ICDMW), pp. 149–154. Cited by: §3.2, §3.2, §4.
  • [10] A. Ghaderi, B. M. Sanandaji, and F. Ghaderi (2017) Deep forecast: deep learning-based spatio-temporal forecasting. arXiv preprint arXiv:1707.08110. Cited by: §4.
  • [11] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in neural information processing systems, pp. 1024–1034. Cited by: §3.3.
  • [12] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §4.
  • [13] G. Lee, Y. Ding, M. G. Genton, and L. Xie (2015) Power curve estimation with multivariate environmental factors for inland and offshore wind farms. Journal of the American Statistical Association 110 (509), pp. 56–67. Cited by: §3.3.
  • [14] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §3.3.
  • [15] A. Pourhabib, J. Z. Huang, and Y. Ding (2016) Short-term wind speed forecast using measurements from multiple turbines in a wind farm. Technometrics 58 (1), pp. 138–147. Cited by: §3.3.
  • [16] D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski (2019) DeepAR: probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting. Cited by: §5.
  • [17] G. Sideratos and N. D. Hatziargyriou (2012)

    Probabilistic wind power forecasting using radial basis function neural networks

    IEEE Transactions on Power Systems 27 (4), pp. 1788–1796. Cited by: §3.4.
  • [18] S. S. Soman, H. Zareipour, O. Malik, and P. Mandal (2010) A review of wind power and wind speed forecasting methods with different time horizons. In North American Power Symposium 2010, pp. 1–8. Cited by: §1.
  • [19] S. Wang, J. Cao, and P. S. Yu (2019) Deep learning for spatio-temporal data mining: a survey. arXiv preprint arXiv:1906.04928. Cited by: §3.2.
  • [20] R. Wiser, E. Lantz, T. Mai, J. Zayas, E. DeMeo, E. Eugeni, J. Lin-Powers, and R. Tusing (2015) Wind vision: a new era for wind power in the united states. The Electricity Journal 28 (9), pp. 120–132. Cited by: §1.
  • [21] H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li (2019) Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 5668–5675. Cited by: §3.2.
  • [22] Q. Zhu, J. Chen, D. Shi, L. Zhu, X. Bai, X. Duan, and Y. Liu (2019) Learning temporal and spatial correlations jointly: a unified framework for wind speed prediction. IEEE Transactions on Sustainable Energy 11 (1), pp. 509–523. Cited by: §4, §4.