A three-stage model for short-term extreme wind speed probabilistic forecasting

10/09/2018 ∙ by Daniela Castro Camilo, et al. ∙ King Abdullah University of Science and Technology 0

Renewable sources of energy such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and to the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values, yet surprisingly little has focused on proposing models to accurately forecast extreme wind speeds, which can damage the turbines. In this work, we develop a flexible three-stage model to forecast extreme and non-extreme wind speeds simultaneously. Each stage of our model belongs to the class of latent Gaussian models, for which the integrated nested Laplace approximation method is designed. Considering a flexible additive regression structure, we propose two models for the latent linear predictor to capture the spatio-temporal dynamics of wind speeds. The first linear predictor is a temporal model that incorporates spatial information trough lagged off-site predictors, chosen as a function of the dominant wind directions. The second one uses stochastic partial differential approximations to the Matérn covariance of a Gaussian field that varies in time according to a first-order autoregressive process. Our models are fast to fit and can describe both the bulk and the tail of the wind speed distribution while producing short-term extreme and non-extreme wind speed probabilistic forecasts.



There are no comments yet.


page 3

page 21

page 25

page 26

page 27

page 28

page 29

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Recent international efforts to mitigate greenhouse gas emissions, such as the Paris agreement (UNFCCC, 2015) whose goal is to hold global warming below 2 degrees Celsius, reflect the urgent need for seeking renewable and clean energy resources. The integration of renewable energies such a wind power into power systems has become a common goal for numerous countries, and it has been developing on a large scale around the world in the last decade. The World Wind Energy Association (WWEA) in its 2017 half-year statistics report (WWEA, 2017), claims that the world wind market has reached over 511 gigawatts (GW), representing an annual growth rate of 5.1%. This growth rate is expected to increase considerably as a consequence of the Paris agreement (Moody’s, 2016).

Wind power generation is sustainable and it is emission-free. Moreover, the cost of a kilowatt (kW) of wind-powered electricity is nearly the same as that of coal or nuclear energy (Hering and Genton, 2010). The advantages of wind energy are counter-balanced by several challenges, such as high variability, limited dispatchability, and non-storability. Wind power production data observed at fine temporal resolutions exhibit successive periods with fluctuations of various dynamic nature and magnitude (Pinson and Madsen, 2012). This high variability makes the wind power a non-steady, non-constant source of energy. Moreover, unlike traditional sources of energy, wind power is not fully dispatchable. Wind farms cannot increase their power generation upon request when there is not sufficient wind; they can only reduce the output. Furthermore, wind cannot be stored for future power generation, and wind-powered energy should be used as it becomes available.

For wind power producers participating in a liberalized electricity market, the natural fluctuation of the wind resource translates into penalties related to regulation costs. Specifically, in these markets, wind energy producers have to propose bids in advance and are then charged for any imbalance between the actual production and the energy bid (Pinson et al., 2007). Making accurate predictions of the future wind power is therefore paramount to reduce the variability and risk faced by wind energy producers.

Wind power forecasts can be made directly if power data are available (Lenzi et al., 2017; Trombe and Pinson, 2012). However, wind speed forecasting can be more precise than wind power forecasting due to the spatial correlation of wind (Zhu and Genton, 2012). In this paper, we focus on wind speed probabilistic forecasting based on turbines installed at the border between Oregon and Washington, along the Columbia River (see Figure 1

). Wind power forecasts can be easily obtained from the forecasted wind speeds using an estimate of the deterministic cubic power curve typically provided by the turbine manufacturer (see, e.g., Figure 5 in

Hering and Genton, 2010).

Figure 1: Map showing the 20 towers along the Columbia River. The red dots correspond to stations that are used later to illustrate some of our methods.

Over the last decades, statistical models have shown to be very effective in capturing the fluctuating characteristics of wind speed to produce accurate short-term wind speed forecasts. Purely temporal models are built assuming that wind speed at each time point is partially predicted by the wind speed in its near past. In this context, different time series models have been proposed for short-term forecasts, such as autoregressive models 

(Huang and Chalabi, 1995), autoregressive moving average models (Erdem and Shi, 2011), and autoregressive integrated moving average models (Palomares-Salas et al., 2009)

. Nevertheless, models that incorporate both temporal and spatial correlations have been found to increase accuracy over conventional time series models. Spatio-temporal models are built under the idea that wind speed characteristics of a region resemble those of neighboring regions. In this framework, the class of regime-based forecasting models describes wind speed conditional on predominant wind patterns in a given region, incorporating off-site information from nearby areas. In this class, the truncated normal regime-switching (TNRS) regression models assume that wind speed follows a truncated normal distribution with regime-dependent mean and variance; see

Gneiting et al. (2006), Hering and Genton (2010), and Kazor and Hering (2015).

1.2 A three-stage hierarchical Bayesian modeling approach

Our main goal is to develop a flexible model that comprises in-site and off-site information to help us forecast future wind speeds at the turbine locations, with a particular focus on extreme wind speeds. The statistical modeling of extremes is challenging, as it usually focuses on the estimation of high (or low) quantiles, where the available data are limited. We here propose two flexible models for both non-extreme (in the bulk) and extreme (in the upper tail) wind speeds, designed to provide forecasts beyond observed wind speed values. Our three-stage hierarchical Bayesian approach is composed of (i) a data layer with suitable distributions for extreme and non-extreme wind speeds, (ii) a latent process that drives the space-time trends and dependence, and (iii) prior distributions. We propose and compare two different models for the latent process. Our models belong to the class of latent Gaussian models, which can be efficiently fitted using the integrated nested Laplace approximation (INLA; 

Rue et al., 2009) conveniently implemented in the R-INLA package.

The data consist of hourly measurements of wind speed and wind direction at 20 towers along the border between Oregon and Washington, and are publicly available at http://transmission.bpa.gov/Business/Operations-/Wind/MetData.aspx. Figure 1 maps the location of the 20 towers labeled by acronym. Figure 7 in the Supplementary Material shows wind roses for all the 20 stations, illustrating the different wind regimes among stations. After basic cleaning and preprocessing, we find that each station encompasses between and hourly measurements of non-zero wind speed from January 2012 to December 2014. Figure 5 in the Supplementary Material displays time series plots exhibiting seasonality patterns at almost every location. Auto-correlation plots shown in Figure 7 of the Supplementary Material confirm the presence of seasonal patterns at most of the stations and show that the data are highly autocorrelated, as expected. Figure 2 shows an example of the persistence of high wind speeds. Persistence is the variable’s tendency to maintain its current state and is one of the most important characteristics of climate variables.

Figure 2: Snapshot of the wind speed time series at Biddle Buttle (left) and Roosevelt (right) showing the persistence of extreme winds (the tendency of high wind speeds to maintain its current state). In this example, extremes are defined as speeds above 22 m/s (blue rectangle). Biddle Buttle exhibits a longer persistence period of mildly high wind speeds, whereas Roosevelt shows a shorter period with a significant wind speed peak.

The remainder of the article is organized as follows. In Section 2 we develop our modeling strategy, while in Section 3 we explain our Bayesian estimation approach using INLA. Section 4 presents the results of our modeling approaches in terms of wind speed forecasting. Conclusions are given in Section 5.

2 Modeling wind speeds using latent Gaussian models

2.1 Extreme wind speeds

High wind speed events are of particular interest for risk assessment. Therefore, while we aim to produce accurate probabilistic forecasts for all possible values of wind speeds, we want to pay particular attention to the upper tail of the distribution, and so we need a model that is able not only to forecast future means and variances but also resilient to extrapolate beyond observed data. Extreme-Value Theory (EVT) offers a rigorous mathematical framework that justifies such an extrapolation; see Davison and Huser (2015) for a review.

In the context of wind-powered energy, a succession of large wind speed values over a period of time may pose a considerable risk to the wind turbines. Therefore a suitable representation for extremes is given by the so-called wind speed exceedances, i.e., wind speeds that exceed a certain high threshold

. A description of the stochastic behavior of these extreme events is given by the conditional probability

, where has distribution . Under broad conditions (Davison and Smith, 1990), it can be shown that as becomes large, the following approximation holds:


where is the generalized Pareto (GP) distribution with scale parameter , shape parameter , and . The shape parameter determines the weight of the upper tail: heavy tails are obtained with , light tails with , and bounded tails with . For sufficiently large we have that


with . Thus, the GP distribution characterizes the intensity of wind speed exceedances, whereas controls the frequency of wind speed exceedances.

In Sections 2.2 and 2.3 we propose a three-stage Bayesian latent Gaussian model, which exploits the GP specification, while accounting for space-time trends and dependence.

2.2 A three-stage model for wind speeds

The statistical modeling of univariate extreme values has been in development since the 1970s. A key component for practical applications has been the development of methodology to account for non-stationarity in the distribution of interest, as advocated by Davison and Smith (1990). Typical approaches to this problem are based on the generalized linear modeling framework, allowing the parameters of a marginal distribution to depend on covariates. Hierarchical Bayesian models for threshold exceedances were introduced by Casson and Coles (1999)Cooley et al. (2007), and Opitz et al. (2018). While Casson and Coles (1999) and Cooley et al. (2007)

use expensive Markov chain Monte Carlo (MCMC) methods to fit their spatial models,

Opitz et al. (2018)

perform approximate Bayesian inference using INLA.

Our approach is similar to the models of Cooley et al. (2007) and Opitz et al. (2018), but it differs in spirit and goals: while Cooley et al. (2007)

aimed at spatially interpolating large return levels, and

Opitz et al. (2018)’s objective was to accurately estimate the 99.8% precipitation quantile over time at some observed and unobserved locations, our goal is to produce accurate short-term wind speed forecasts at the turbine locations. Here, following Opitz et al. (2018), we overcome the computational costs and tricky convergence checks of MCMC methods by using a fast and accurate inference method based on INLA; see Section 3. We now detail our three-stage modeling and estimation strategy for wind speed forecasting.

Stage 1.

As wind speeds are non-negative and usually right-skewed, distributions such as the Gamma, Weibull, and truncated normal are commonly chosen to fit the bulk of the wind speed distribution; see, e.g. 

Hering and Genton (2010), Kazor and Hering (2015). Here we assume that positive non-extreme wind speeds at location and time ,

, can be characterized by a Gamma distribution parametrized in terms of an

-quantile and a precision parameter , assumed to be constant in time:

where is the quantile function of the Gamma distribution with unit rate and shape , evaluated at the probability .

Stage 2. In Stage 1, the probability of exceeding the high threshold is fixed over positive wind speeds. Here we re-estimate this probability to account for the few zero values that were previously excluded. We model exceedance indicators

using the Bernoulli distribution,

, where , and has been previously estimated in Stage 1.

Stage 3. Since the tail of the Gamma distribution decays exponentially fast, probabilities associated with extreme events might be underestimated. To correct the probabilities associated with tail events, we use a GP distribution; recall Section 2.1. Threshold exceedances defined as , are characterized by a GP distribution as in (2.1), but parametrized in terms of a shape parameter (constant in time) and a specific time-varying -quantile . With this parametrization, the GP distribution is

Note that the goal of the Gamma is twofold: to describe the distribution of non-extreme wind speeds and to obtain a suitable time-varying threshold

to define wind speed exceedances. The second and third stages model the frequency and intensity of extremes, respectively, and are connected to the first stage through the time-varying threshold. The shape parameters for the Gamma and GP likelihoods, treated as hyperparameters, are fixed in time for the model presented in Section 

2.3.2, and fixed in space and time (i.e., , ) for the model presented in Section 2.3.3.

2.3 Latent Gaussian models and prior specification

2.3.1 Generalities

Latent Gaussian models are a broad and flexible class of models, ranging from generalized linear models to spatial and spatio-temporal models (Rue et al., 2009). This class of models can be specified using a hierarchical model formulation, where the observations (which here represent wind speeds at any location in , where is the study region shown in Figure 1, or exceedance indicators, or threshold exceedances) are assumed to be conditionally independent given a latent Gaussian random field and hyperparameters ,


The latent Gaussian random field , controlled by the hyperparameters , describes the underlying dependence structure of the data, and its specification in (2.3.1) is key to generate a flexible and versatile class of models. Here, and are the mean and precision matrix of , respectively. We assume that only depends on a linear predictor which has an additive structure with respect to some fixed covariates and random effects, i.e.,


Here, is the overall intercept, are known covariates with coefficients , and are specific Gaussian processes defined in terms of a set of covariates . If we further assume that

has a Gaussian prior, then the joint distribution of

is Gaussian, and yields the latent field in the hierarchical formulation (2.3.1); see e.g., Rue et al. (2017).

To construct a suitable linear predictor for our wind speed forecasting problem, we must take into account the properties detailed in Section 1, i.e., space-time dependence, seasonality, and persistence. In the following sections, we propose two different models for the linear predictors to capture these characteristics.

2.3.2 Temporal model with off-site information

For the modeling of wind speeds, we use the latent Gaussian modeling framework of Section 2.3.1, and we propose using the following linear predictor, for each fixed location :


where for each , is the lagged time series of wind speeds at the -th neighbor of , and is the set of neighbors of of cardinality . The coefficients (fixed effects) quantify the effect that wind speeds at the -th neighbor observed at time lag one have on the response. The random effect corresponds to a zero-mean autoregressive Gaussian process of first order, i.e.,

where is a precision parameter. The random effect captures the hourly variation of wind speeds, and is described by a cyclic Gaussian random walk of second order with precision , defined over the 24 hours within a day (Rue and Held, 2005, Ch. 3). Let denote the hour associated to time , then:

Owing to the natural spatio-temporal variability of wind speeds, spatial information is here included in the linear predictor (2.5) in the form of off-site predictors. Although traditional statistical models for wind speed and wind power forecasting consist of on-site observations and/or meteorological forecasts (see e.g., Trombe et al., 2012; Pinson and Madsen, 2012), other studies have explored the potential of off-site observations as new predictors (Gneiting et al., 2006; Zhu and Genton, 2012; Kazor and Hering, 2015). Specifically, lagged wind speeds at nearby stations can be informative of upcoming changes in the station of interest (Trombe and Pinson, 2012). More details on the selection of off-site predictors are given in Section 4.1.

The two random effects in the linear predictor (2.5) account for unobserved heterogeneity within each station. The random effect captures the autocorrelation structure of order one within each time series of wind speeds. Our experiments show that including greater lags do not improve the fit or predictive power of our model. Moreover, the random effect along with the lagged covariates account for temporal dependence and persistence. The random effect represents the variation of wind speeds within a day. Exploratory analysis (not shown) on the hourly behavior of wind speeds shows a daily pattern that varies between stations. The random effect accounts for this diurnal and volatile nature of wind speeds in a less restrictive way than the harmonic terms in the trigonometric direction diurnal model of Hering and Genton (2010), or the regime switching space-time diurnal model of Gneiting et al. (2006). In both random effects, the precision hyperparameters , , control the strength of dependence among neighboring covariate classes.

The first two layers of our hierarchical model (see representation (2.3.1)) are completed by linking each one of the stages detailed in Section 2.2 to the linear predictor in (2.5) as follows:

Stage 1:
Stage 2:
Stage 3: (2.6)

The third and final layer of our hierarchical model is given by the prior distributions for the hyperparameters, namely and . When little expert knowledge is available, a common practice is to assume non-informative priors. As an alternative, informative priors can be proposed using Penalized Complexity (PC) priors (Simpson et al., 2017). In this framework, model components (in our case, and ) are considered to be flexible extensions of simpler and less flexible base models. Priors for the model components parameters are then developed in such a way that they shrink the components towards their base models. Simpson et al. (2017)

propose to use the Kullback-Leibler divergence to measure the squared distance from the base model to its flexible extension, and to penalize this distance at a constant rate. For our wind speed forecasting problem, we assume fairly informative PC priors for the correlation hyperparameter of the AR(1) process, and the precision hyperparameter of the random walk of order two. Specifically,

and , where

denotes the standard deviation of the temporally aggregated wind speeds. A diffuse prior is assumed for

, the precision hyperparameter of the AR(1). The prior distribution for the Gamma shape is slightly informative and is defined through a Gamma distribution with shape 10 and rate 1, which gives a high probability to values between 5 and 15. Finally, a strong PC prior is assumed for the shape parameter of the GP distribution; since large values of the shape parameter are usually unrealistic, we here assume that .

2.3.3 Space-time model using stochastic partial differential equations

The expression for the linear predictor in (2.5) corresponds to a temporal model that introduces spatial information using lagged off-site predictors. We also explore a proper space-time model, which explicitly accounts for spatial dependence amongst wind speeds at different stations. Specifically, we assume that the space-time dependence between wind speeds at different wind towers can be described by a spatio-temporal term that varies in time according to a first-order autoregressive structure. Specifically, we assume that

where , and is a zero-mean, temporally independent Gaussian field, that is completely determined by its covariance function

where is the stationary Mátern correlation function (see, e.g., Handcock and Stein, 1993). This gives rise to our second linear predictor:


where is an intercept, is the cyclic random effect described in Section 2.3.2, capturing the sub-daily variations that are common to all the stations, and is its precision parameter.

The latent process in (2.7

) has dense covariance and (possibly) precision matrices, which implies that any attempt to make Bayesian inference can be computationally demanding. To avoid this issue, we use the Stochastic Partial Differential Equations (SPDE) approach introduced by 

Lindgren et al. (2011). The SPDE approach consists of constructing a continuous approximation to the Gaussian field by using a continuous SPDE model defined on the entire study area. It can be shown that this SPDE has a Gaussian field with a Matérn covariance function as stationary solution (Whittle, 1954, 1963). Under certain conditions (see, e.g., Bakka et al., 2018), the continuous SPDE solution has the Markovian property. This property produces sparse precision matrices that can be easily factorized, and are the focus of the INLA methodology. An approximate discrete solution of the SPDE in a bounded domain, defined in our case by the location of the towers, is obtained using a finite element method, that allows for flexible boundaries and different levels of accuracy for the discretization; see Section A.3 of the Supplementary Material for details on the mesh that is used to discretize the study region. For more details on spatial modeling using the SPDE approach, see the recent review by Bakka et al. (2018).

Since the Cascade Mountains illustrated in Figure 1 naturally divide the study region in two sub-regions, we consider two sub-models for , one for stations to the East and another for stations to the West of the mountains. We fit each set of stations separately but using the same latent structure specified in (2.7).

The link between this new linear predictor and the model likelihoods is the same as in (2.3.2). The Gaussian field has two parameters, namely the marginal variance and the range of dependence (the smoothness parameter is kept fixed and equal to 2). PC priors on these parameters are chosen such that the variance is shrinked towards zero, whereas the range is shrinked towards infinity (Fuglstad et al., 2018). Specifically, we set and , where is the standard deviation of the temporally aggregated wind speeds, and is the median of the distance between stations. For stations to the East of the Cascade Mountains, km, and for stations to the West, km. A PC prior is also chosen for the correlation coefficient of the autoregressive term in , specifically . Prior for the precision of is the same as before. Priors for the shape parameters of the Gamma and GP distributions, which are now fixed over space and time (i.e., , ), are also the same as before; see Section 2.3.2.

To easily distinguish between our three-stage model using the first or second linear predictor, in the following we will refer to the model using the first linear predictor, , as the temporal model with off-site predictors, whereas the model using the second linear predictor, , will be referred to as the SPDE model.

3 Inference based on INLA

Here we describe the form of the joint posterior distribution for each of the three stages detailed in Section 2.2. In the following, we remove the spatial component from the models for ease of notation. Let y

denote the vector of observations for any of the three stages detailed above, with associated hyperparameters

. As above, let be the latent Gaussian random field, , and . Then, the joint posterior distribution of parameters and hyperparameters at any location , for any of the three stages, can be written as


where and are the mean and precision matrix of , respectively, and for Stages 1 and 2, and is equal to the number of exceedances in Stage 3. The main objectives of the statistical inference are to extract from (3) the marginal posterior distributions for each of the elements of the linear predictor vector, and for each hyperparameter, i.e.,


from which predictive distributions may be derived.

In a Bayesian framework, model estimation is typically performed using simulation-based techniques, such as MCMC methods. Alternatively, approximate methods can be used to cope with the computation of high dimensional integrals needed to obtain posterior distributions. One of such approaches that has become increasingly popular in the last decade is the integrated nested Laplace approximation (INLA; Rue et al., 2009), where posterior distributions of interest are numerically approximated using the Laplace approximation, therefore avoiding the usually complex updating schemes, long running times, and diagnostic convergence checks of simulation-based MCMC. INLA is designed for latent Gaussian models and therefore, it can be successfully used in a wide variety of applications (see for instance Riebler et al., 2012; Lombardo et al., 2018; Krainski et al., 2019). The R-INLA package (Bivand et al., 2015) is a convenient interface to the INLA methodology. A wide variety of models are already implemented, and the package is continuously maintained and updated. In particular, our work has motivated the INLA implementation of the Gamma and Weibull quantile regressions, both now available to the users. Details regarding the numerical approximations to (3.2) are given by INLA are provided in Section A.1 of the Supplementary Material.

4 Wind speed probabilistic forecasting results

4.1 Automatic off-site predictor selection based on wind direction

An important step to fit our temporal model with off-site predictors at each station , is to select a suitable set of off-site predictors . There are different approaches for integrating off-site predictors into forecasting models (Larson and Westrick, 2006; Damousis et al., 2004). Here, we develop a data-driven approach for identifying dominant wind directions, which we then use subsequently to automatically choose the off-site predictors. Specifically, for each station , we fit a mixture of von Mises circular distributions to the time series of wind directions, . The von Mises density with location parameter and concentration parameter is given by , where is the modified Bessel function of order 0. For a mixture of von Mises distributions, we identify the dominant wind directions with the locations parameters , , and construct the sets of angles , with . Then, the set of off-site predictors for the station corresponds to all the stations whose angle with lies in , such that their distance to is less than some maximum distance . We selected the number of components for the mixture of von Mises distributions via the Bayesian Information Criterion (BIC), also guided by the wind roses displayed in Figure 6 of the Supplementary Material. The angle depends on the geographical conditions and the distance between stations. We choose for all stations except Mt. Hebo (HEB) where . The maximum distance between stations is km, which corresponds to the mean distance among all stations. To illustrate our approach, Figure 3 displays the fitted dominant wind directions for Biddle Buttle and Roosevelt stations (see Figure 8 of the Supplementary Material for the fits at all stations). The number of off-site predictors for all the stations varies between 2 and 10, while the distance between off-site predictors and ranges between 13.3km and 175.3km, for all .

Figure 3: Circular histograms and fitted von Mises distribution mixtures with two and three dominating wind directions for Biddle Buttle and Roosevelt stations, respectively.

4.2 Posterior predictive distributions

Here we briefly explain how to obtain posterior predictive distributions for the 1-hour, 2-hour, and 3-hour ahead probabilistic forecasts of hourly wind speeds, produced by our three-stage hierarchical Bayesian model, using the linear predictors introduced in Section 

2.3. For the SPDE model, we use a rolling training period of length 5 days, whereas for the temporal model, we multiply this period by the number of stations in each side of the Cascade Mountains, as a way to balance the effective sample sizes of the SPDE and the temporal models. We generate samples from the posterior predictive distribution, for each station, each forecasting time horizon, and each linear predictor, as follows: we extract the fitted linear predictors and hyperparameters for each stage, and use (2.3.2) to obtain samples for the Gamma, Bernoulli, and GP predictive distributions. We replace Gamma samples by threshold exceedances whenever the threshold is exceeded, i.e., whenever the associated Bernoulli sample is equal to 1. In other words, the tail of the Gamma distribution is corrected by the GP distribution in the presence of exceedances.

4.3 Forecast evaluation for extreme and non-extreme wind speeds

Here we consider performance measures that describe the ability of our three-stage model to forecast extreme and non-extreme wind speeds. For non-extreme wind speeds, we consider the continuous ranked probability score (CRPS) introduced by Gneiting et al. (2007). The CRPS is a robust proper scoring rule that quantifies the calibration and sharpness of the forecast (Gneiting et al., 2007). It is defined as


where is the predictive distribution at station and time , and is the observed wind speed. The average CRPS for the -hour ahead forecast is


To assess the predictive performance in the upper tail of the distribution, we use the quantile loss function and the threshold-weighted continuous ranked probability score (twCRPS) 

(Gneiting and Ranjan, 2011; Lerch et al., 2017). The quantile loss function measures the performance of a model to estimate a specific quantile , and it is defined as

The twCRPS is defined as

where is a non-negative weight function on the real line. For , the twCRPS reduces to the CRPS in (4.1). As suggested by Lerch and Thorarinsdottir (2013), since our interest is on the right tail of the wind speed distribution, we may set , for some . The average twCRPS for -hour ahead forecast is defined as in (4.2).

Table 1 shows performance measures for one-hour ahead forecasts using the first and second proposed linear predictors, respectively. Throughout all the fits, we set the probabilities defined in Section 2.2 as and . The performance measures are the mean CRPS, mean absolute error (MAE), mean twCRPS using an indicator weight function with equal to the 90%, 95%, and 98% quantile of the wind speed distribution at each station, and mean quantile loss (MQL). We can see that the temporal model with off-site predictors performs slightly better than the SPDE model at predicting average wind speed values (CRPS is better at 12 out of 20 stations, whereas MAE is better at 11 out of 20 stations); on the other hand, the SPDE model is much better at predicting extreme wind speeds. The SPDE model overcomes the temporal model with off-site predictors in terms of the twCRPS at almost all stations, whereas the MQL is better at 13 out of 20 stations. This may be due to the difficulty in estimating the GP shape parameter at each station using the temporal model, while a single shape parameter is assumed in the SPDE model, reducing dramatically the estimated posterior predictive uncertainty by borrowing strength across all stations. An additional advantage of the SPDE model is that it can be also used to forecast wind speed at unobserved stations, which it cannot be done with the temporal model with off-site predictors.

AUG 0.90/0.81 1.83/0.83 0.32/0.20 0.14/0.10 0.07/0.03 0.75/0.55
BID 0.69/2.79 1.89/3.49 0.31/0.02 0.17/0.01 0.08/0.00 0.64/2.96
BUT 0.87/1.05 1.89/1.04 0.12/0.01 0.07/0.00 0.04/0.00 0.80/0.20
CNK 0.96/0.76 1.93/1.47 0.17/0.18 0.09/0.17 0.04/0.12 0.96/0.78
FOR 0.35/2.07 0.62/3.08 0.24/0.02 0.11/0.01 0.04/0.00 0.33/0.23
GDH 0.70/0.71 1.23/1.42 0.27/0.22 0.11/0.07 0.05/0.01 0.61/0.59
HOO 0.60/0.40 0.92/0.02 0.27/0.03 0.11/0.01 0.04/0.00 0.54/0.51
HOR 0.73/0.76 1.22/1.25 0.26/0.22 0.12/0.15 0.06/0.58 0.65/0.64
KEN 1.05/1.10 1.91/1.84 0.29/0.00 0.17/0.00 0.06/0.00 0.81/0.59
MAR 1.01/2.12 2.23/3.82 0.25/0.02 0.14/0.01 0.07/0.00 0.96/2.05
MEG 0.64/1.50 1.16/2.59 0.35/0.02 0.15/0.01 0.06/0.00 0.59/1.38
HEB 0.52/1.22 1.07/2.41 0.35/0.02 0.17/0.01 0.08/0.00 0.49/0.73
NAS 0.66/1.46 1.22/2.79 0.30/0.02 0.14/0.01 0.06/0.00 0.60/1.56
ROO 1.00/0.75 2.05/1.51 0.18/0.01 0.10/0.00 0.06/0.00 0.86/0.64
SML 0.87/0.55 1.61/1.32 0.30/0.01 0.14/0.00 0.03/0.00 0.80/0.63
SHA 0.79/0.42 1.28/0.89 0.35/0.34 0.15/0.13 0.06/0.03 0.68/0.55
SUN 1.06/0.27 1.83/1.80 0.29/0.22 0.17/0.16 0.09/0.06 0.98/0.78
TIL 0.49/1.08 0.84/1.79 0.34/0.01 0.21/0.00 0.08/0.00 0.48/0.64
TRO 0.56/0.77 1.21/1.59 0.26/0.01 0.13/0.00 0.05/0.00 0.54/0.68
WAS 0.78/0.73 1.33/1.59 0.36/0.01 0.10/0.00 0.04/0.00 0.72/0.38
Table 1: Performance measures for one-hour ahead forecast using the temporal model with off-site predictors and the SPDE model. Mean continuous ranked probability score (CRPS); mean absolute error (MAE); mean threshold-weighted continuous ranked probability score (twCRPS) using an indicator weight function for equal to the 90%, 95%, and 98% quantile of the wind speed distribution at each station; and mean quantile loss (MQL) with .

We assess the calibration of our probabilistic forecasts using reliability diagrams (see, e.g., Lenzi et al., 2017). Reliability refers to the ability of the model to match the observation frequencies. The diagram is constructed as follows: for every station, we compute the nominal coverage rate, which is the proportion of times that the cumulative distribution of our three-stage model is below a certain threshold. Our model is well calibrated if this proportion is close to the observed frequencies. Figure 4 (a) show reliability diagrams for the temporal model with off-site predictors (red line) and the SPDE model (green line). Both models tend to overestimate the wind speed quantiles smaller than the median. The temporal model with off-site predictors underestimates the wind speed values larger than the median, whereas the SPDE model is better calibrated at higher quantiles.

Figure 4: (a): Reliability diagrams for the temporal model with off-site predictors (red line) and the SPDE model (green line). (b)-(c): PIT values based on the GP distribution for the temporal model with off-site predictors (b) and the SPDE model (c).

Finally, to compare the forecast performance of our two models, we compute predictive measures of fit for extreme wind speeds. Figure 4

(b) and (c) shows Probability Integral Transform (PIT) plots based on the GP distribution, using the temporal model with off-site predictors and the SPDE model. The PIT is the probability of a new response being less than the observed response using a model based on the rest of the data. If the model assumptions are correct, we expect the PIT values to be uniformly distributed. Although there is still room for improvement, we can see that both models perform decently well at predicting the severity of extreme events, and that the SPDE model seems slightly better.

5 Conclusion

Wind power is getting increasing attention from many countries as a source of clean energy with little environmental impact. A recent report from The European Power Sector (Agora-Energiewende and Sandbag, 2018) shows that renewable sources of energy like wind, solar and biomass, overtook coal for the first time. In particular, wind generation increased considerably by 19% during 2017, due to good wind conditions and huge investments in wind power plants.

As cities powered by electricity generated by wind are now a reality, new and accurate methods to forecast wind speed, and therefore wind power, are required. In this work, we explored a three-stage Bayesian model designed to forecast extreme wind speeds. Our model corrects the tail of the Gamma distribution by a generalized Pareto distribution in the presence of exceedances. Each stage of our model belongs to the class of latent Gaussian models, for which the integrated nested Laplace approximation (INLA; Rue et al., 2009) method is well-suited. Considering an additive structure, we proposed two types of linear predictors describing the spatio-temporal dynamics of wind speeds. The first linear predictor includes off-site information in terms of lagged wind speeds from neighboring stations. To select the neighbors, we propose an automatic method to identify dominant wind direction patterns based on mixtures of von Mises circular distributions. The second linear predictor considers a spatio-temporal term with Matérn covariance structure, that varies in time according to a first-order autoregressive dynamic. This model is fitted using the SPDE approach, which links Gaussian random fields with dense precision matrices with Gaussian Markov Random fields, characterized by sparse precision matrices. In terms of extreme wind speed forecasting, both models seem to perform decently well, although the SPDE model seems to be better calibrated at higher quantiles, probably because it better exploits spatial information. It would be interesting to explore the potential of the SPDE model to predict extreme wind speeds at unobserved locations, which could be helpful for optimal design of wind farms.

Thanks to the very powerful and fast INLA approach, we can implement in a reasonable amount of time complex hierarchical spatial models that are well suited to wind speed data. Finally, by selecting a suitable distribution in the first stage, our three-stage model procedure for exceedances can be easily adapted to model and forecast other types of environmental data.

It would be interesting to extend our approach to cases with lighter Weibull-like tails. One option could be to pre-transform the data as , with some , and then proceed identically, or alternatively to fit directly a generalized Gamma distribution. Similarly, it would be interesting to fit the GP distribution with shape parameter (i.e., not only ). Nevertheless, designing suitable shrinkage priors for is still an open question.

Appendix A Supplementary Material

a.1 Inference for our three-stage model using INLA

From a computational point of view, the main goal of INLA is to accurately approximate the integrals in Section 4 of the main paper. Here we discuss general concepts behind this numerical approximation, and the reader is referred to Rue et al. (2009)Blangiardo and Cameletti (2015), and Rue et al. (2017) for more mathematical details.

We need to compute from which also all the relevant marginals can be obtained, and .

  • Computation of . Note that


    where is the Gaussian approximation of and is its mode. This Gaussian approximation can be written as

    where is the location of the mode, , and the vector contains in the diagonal the negative second derivatives of the log-likelihood at the mode with respect to ; see Rue and Held (2005), Section 4.4.1. The proportional sign in (A.1) comes from the fact that the normalizing constant for is unknown. Rue and Martino (2007) show that the approximation is particularly accurate for a wide range of latent Gaussian models.

  • Computation of . Although we could start from the previous result and take the Gaussian marginals derived from (this is the Gaussian approximation method in INLA), the error in this approximation is considerably high (Rue and Martino, 2007). To cope with this issue, Rue et al. (2009) propose to use the Laplace approximation one more time to approximate (this is the nested part in INLA). Specifically, we can write


    As before, is the Gaussian approximation of evaluated at its mode . Although this approximation (Laplace approximation method in INLA) typically works well (Blangiardo et al., 2013), it involves the location of the mode and the factorization of a matrix many times for each . Alternatively, we can use a Taylor expansion on the numerator and denominator of the expression (A.2), thus correcting the Gaussian approximation for location and skewness with a lower computational cost (this is the simplified Laplace approximation method in INLA).

The INLA strategy can be summarized as follows:

  1. Explore the marginal joint posterior to locate the mode.

  2. Conduct a grid search to produce a set of points and integration weights to obtain .

  3. Approximate the marginal posterior using interpolation based on .

  4. For each , evaluate the conditional posteriors on a grid of values for . Obtain marginal posteriors by numerical integration (this is the integrated part in INLA).

As a result, the approximated posterior marginals of interest returned by INLA have the following form:

a.2 Exploratory analysis

a.2.1 Time series plots

Figure 5: Positive wind speed time series (in m/s) measured at the 20 stations detailed in Figure 2 in the main paper. All time series are displayed using a common y-axis.

a.2.2 Wind roses

Figure 6: Wind roses for the period January 2012 to December 2014 at the 20 stations detailed in Figure 2 of the main paper.

a.2.3 Autocorrelation plots

Figure 7: Autocorrelation plots for the period January 2012 to December 2014 at the 20 stations detailed in Figure 2 of the main paper.

a.2.4 Dominant wind directions for the temporal model with off-site predictors

Figure 8: Circular histograms and von Mises distributions fitted to all stations.

a.3 Discretization of the study region for the SPDE model

Figure 9 shows the mesh that is used to discretize the study region, split into two parts for the West and the East side of the Cascade Mountains.

Figure 9: West (left) and East (right) triangle mesh based on the stations’ location (red dots).


We thank Amanda Hering for helpful discussion and suggestions, and for providing the wind speed data. We also extend our thanks to Thomas Opitz for helpful discussion, and to Amanda Lenzi for her constructive suggestions regarding the INLA implementation. Support from the KAUST Supercomputing Laboratory and access to Shaheen is also gratefully acknowledged. This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2017-3434.


  • Agora-Energiewende and Sandbag (2018) Agora-Energiewende and Sandbag (2018) The European Power Sector in 2017. State of Affairs and Review of Current Developments. Press Release, EU, December 17, 2008, available at: https://sandbag.org.uk/wp-content/uploads/2018/01/EU-power-sector-report-2017.pdf .
  • Bakka et al. (2018) Bakka, H., Rue, H., Fuglstad, G. A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D. and Lindgren, F. (2018) Spatial modelling with R-INLA: A review. WIREs Computational Statistics (arxiv:1802.06350) (Invited extended review, to appear).
  • Bivand et al. (2015) Bivand, R., Gómez-Rubio, V. and Rue, H. (2015) Spatial data analysis with R-INLA with some extensions. Journal of Statistical Software 63(1), 1–31.
  • Blangiardo and Cameletti (2015) Blangiardo, M. and Cameletti, M. (2015) Spatial and spatio-temporal Bayesian models with R-INLA. John Wiley & Sons.
  • Blangiardo et al. (2013) Blangiardo, M., Cameletti, M., Baio, G. and Rue, H. (2013) Spatial and spatio-temporal models with R-INLA. Spatial and spatio-temporal epidemiology 4, 33–49.
  • Casson and Coles (1999) Casson, E. and Coles, S. (1999) Spatial regression models for extremes. Extremes 1(4), 449–468.
  • Cooley et al. (2007) Cooley, D., Nychka, D. and Naveau, P. (2007) Bayesian spatial modeling of extreme precipitation return levels. Journal of the American Statistical Association 102(479), 824–840.
  • Damousis et al. (2004) Damousis, I. G., Alexiadis, M. C., Theocharis, J. B. and Dokopoulos, P. S. (2004) A fuzzy model for wind speed prediction and power generation in wind parks using spatial correlation. IEEE Transactions on Energy Conversion 19(2), 352–361.
  • Davison and Huser (2015) Davison, A. C. and Huser, R. (2015) Statistics of extremes. Annual Review of Statistics and its Application 2, 203–235.
  • Davison and Smith (1990) Davison, A. C. and Smith, R. L. (1990) Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological) 52(3), 393–442.
  • Erdem and Shi (2011) Erdem, E. and Shi, J. (2011) ARMA based approaches for forecasting the tuple of wind speed and direction. Applied Energy 88(4), 1405–1414.
  • Fuglstad et al. (2018) Fuglstad, G.-A., Simpson, D., Lindgren, F. and Rue, H. (2018) Constructing priors that penalize the complexity of Gaussian random fields. Journal of the American Statistical Association pp. 1–8.
  • Gneiting et al. (2007) Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(2), 243–268.
  • Gneiting et al. (2006) Gneiting, T., Larson, K., Westrick, K., Genton, M. G. and Aldrich, E. (2006) Calibrated probabilistic forecasting at the stateline wind energy center: The regime-switching space–time method. Journal of the American Statistical Association 101(475), 968–979.
  • Gneiting and Ranjan (2011) Gneiting, T. and Ranjan, R. (2011) Comparing density forecasts using threshold-and quantile-weighted scoring rules. Journal of Business & Economic Statistics 29(3), 411–422.
  • Handcock and Stein (1993) Handcock, M. S. and Stein, M. L. (1993) A Bayesian analysis of kriging. Technometrics 35(4), 403–410.
  • Hering and Genton (2010) Hering, A. S. and Genton, M. G. (2010) Powering up with space-time wind forecasting. Journal of the American Statistical Association 105(489), 92–104.
  • Huang and Chalabi (1995) Huang, Z. and Chalabi, Z. (1995) Use of time-series analysis to model and forecast wind speed. Journal of Wind Engineering and Industrial Aerodynamics 56(2–3), 311–322.
  • Kazor and Hering (2015) Kazor, K. and Hering, A. S. (2015) The role of regimes in short-term wind speed forecasting at multiple wind farms. Stat 4(1), 271–290.
  • Krainski et al. (2019) Krainski, E. T., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F. and Rue, H. (2019) Advanced Spatial Modeling with Stochastic Partial Differential Equations using R and INLA. CRC press. Github version www.r-inla.org/spde-book.
  • Larson and Westrick (2006) Larson, K. A. and Westrick, K. (2006) Short-term wind forecasting using off-site observations. Wind Energy: An International Journal for Progress and Applications in Wind Power Conversion Technology 9(1–2), 55–62.
  • Lenzi et al. (2017) Lenzi, A., Pinson, P., Clemmensen, L. H. and Guillot, G. (2017) Spatial models for probabilistic prediction of wind power with application to annual-average and high temporal resolution data. Stochastic Environmental Research and Risk Assessment 31(7), 1615–1631.
  • Lerch and Thorarinsdottir (2013) Lerch, S. and Thorarinsdottir, T. L. (2013) Comparison of non-homogeneous regression models for probabilistic wind speed forecasting. Tellus A: Dynamic Meteorology and Oceanography 65(1), 21206.
  • Lerch et al. (2017) Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F., Gneiting, T. et al. (2017) Forecaster’s dilemma: Extreme events and forecast evaluation. Statistical Science 32(1), 106–127.
  • Lindgren et al. (2011) Lindgren, F., Rue, H. and Lindström, J. (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.
  • Lombardo et al. (2018) Lombardo, L., Opitz, T. and Huser, R. (2018) Point process-based modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster. Stochastic Environmental Research and Risk Assessment 32(7), 2179–2198.
  • Moody’s (2016) Moody’s (2016) Global wind turbine manufacturers: COP 21 agreement will support wind turbine manufacturers’ sales momentum. Available at hhttps://www.moodys.com .
  • Opitz et al. (2018) Opitz, T., Huser, R., Bakka, H. and Rue, H. (2018) INLA goes extreme: Bayesian tail regression for the estimation of high spatio-temporal quantiles. Extremes 21(3), 441–462.
  • Palomares-Salas et al. (2009)

    Palomares-Salas, J., De La Rosa, J., Ramiro, J., Melgar, J., Aguera, A. and Moreno, A. (2009) Arima vs. neural networks for wind speed forecasting.

    In Computational Intelligence for Measurement Systems and Applications, 2009. CIMSA’09. IEEE International Conference on, pp. 129–133.
  • Pinson et al. (2007) Pinson, P., Chevallier, C. and Kariniotakis, G. N. (2007) Trading wind generation from short-term probabilistic forecasts of wind power. IEEE Transactions on Power Systems 22(3), 1148–1156.
  • Pinson and Madsen (2012) Pinson, P. and Madsen, H. (2012) Adaptive modelling and forecasting of offshore wind power fluctuations with Markov-switching autoregressive models. Journal of forecasting 31(4), 281–313.
  • Riebler et al. (2012) Riebler, A., Held, L., Rue, H. et al. (2012) Estimation and extrapolation of time trends in registry data—borrowing strength from related populations. The Annals of Applied Statistics 6(1), 304–333.
  • Rue and Held (2005) Rue, H. and Held, L. (2005) Gaussian Markov random fields: theory and applications. CRC press.
  • Rue and Martino (2007) Rue, H. and Martino, S. (2007) Approximate Bayesian inference for hierarchical Gaussian Markov random field models. Journal of statistical planning and inference 137(10), 3177–3192.
  • Rue et al. (2009) Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal statistical society: Series B (Statistical Methodology) 71(2), 319–392.
  • Rue et al. (2017) Rue, H., Riebler, A., Sørbye, S. H., Illian, J. B., Simpson, D. P. and Lindgren, F. K. (2017) Bayesian computing with INLA: a review. Annual Review of Statistics and Its Application 4, 395–421.
  • Simpson et al. (2017) Simpson, D., Rue, H., Riebler, A., Martins, T. G., Sørbye, S. H. et al. (2017) Penalising model component complexity: A principled, practical approach to constructing priors. Statistical Science 32(1), 1–28.
  • Trombe and Pinson (2012) Trombe, P.-J. and Pinson, P. (2012) High-resolution forecasting of wind power generation with regime-switching models and off-site observations. DTU Informatics.
  • Trombe et al. (2012) Trombe, P.-J., Pinson, P. and Madsen, H. (2012) A general probabilistic forecasting framework for offshore wind power fluctuations. Energies 5(3), 621–657.
  • UNFCCC (2015) UNFCCC, V. (2015) Adoption of the Paris Agreement. Proposal by the President (Draft Decision). United Nations Office, Geneva, Switzerland .
  • Whittle (1954) Whittle, P. (1954) On stationary processes in the plane. Biometrika pp. 434–449.
  • Whittle (1963) Whittle, P. (1963) Stochastic-processes in several dimensions. Bulletin of the International Statistical Institute 40(2), 974–994.
  • WWEA (2017) WWEA (2017) The WWEA half-year report. europa.eu/rapid/press-release_IP-08-1998_en.pdf .
  • Zhu and Genton (2012) Zhu, X. and Genton, M. G. (2012) Short-term wind speed forecasting for power system operations. International Statistical Review 80(1), 2–23.