# Integrative Probabilistic Short-term Prediction and Uncertainty Quantification of Wind Power Generation

We develop an integrative framework to predict the wind power output, considering many uncertainties. For probabilistic wind power forecasts, all the sources of uncertainties arising from both wind speed prediction and wind-to-power conversion process should be collectively addressed. To this end, we model the wind speed using the inhomogeneous geometric Brownian motion and convert the wind speed's prediction density into the wind power density in a closed-form. The resulting wind power density allows us to quantify prediction uncertainties through prediction intervals and to forecast the power that can minimize the expected prediction cost with unequal penalties on the overestimation and underestimation. We evaluate the predictive power of the proposed approach using data from commercial wind farms located in different sites. The results suggest that our approach outperforms alternative approaches in terms of multiple performance measures.

## Authors

• 5 publications
• 2 publications
• 2 publications
• 3 publications
• 4 publications
11/24/2018

### A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Precisely forecasting wind speed is essential for wind power producers a...
06/04/2021

### Probabilistic Neural Network to Quantify Uncertainty of Wind Power Estimation

Each year a growing number of wind farms are being added to power grids ...
01/21/2014

### Optimal Intelligent Control for Wind Turbulence Rejection in WECS Using ANNs and Genetic Fuzzy Approach

One of the disadvantages in Connection of wind energy conversion systems...
05/07/2021

### Probabilistic Modeling of Hurricane Wind-Induced Damage in Infrastructure Systems

This paper presents a modeling approach for probabilistic estimation of ...
09/16/2021

### Distributionally Robust Optimal Power Flow with Contextual Information

In this paper, we develop a distributionally robust chance-constrained f...
12/11/2012

### Performance Analysis of ANFIS in short term Wind Speed Prediction

Results are presented on the performance of Adaptive Neuro-Fuzzy Inferen...
05/10/2022

### Large Scale Probabilistic Simulation of Renewables Production

We develop a probabilistic framework for joint simulation of short-term ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The market share of renewable energy in the electricity power market has been increasing significantly during the past few decades [1]. According to the report issued by [2], the annual electricity generation from renewable sources, excluding the hydro-power, has more than doubled since 2004 in the U.S. Moreover, renewable energy has been a key sector in newly-added electricity facilities. In 2014, more than half of U.S. electricity capacity additions are from the investments on renewable energy [2]. Among the various sources of the renewable energy, wind energy has become one of the major sources of the increasing renewable capacities [2].

Unlike traditional fossil-based energy sources, wind power generation is highly affected by stochastic weather conditions [3, 4], which poses significant challenges in achieving secure power grid operations [5]. Consequently, accurate forecast of wind power generation and its uncertainty quantification become critical components in several decision-making processes including unit commitment, economic dispatch and reserve determination [6, 7].

Accordingly, wind speed and wind power generation forecasts have been widely investigated in the literature (e.g., [6, 8, 9]). Many studies focus on generating point forecasts of wind speed or power. However, due to the highly volatile and intermittent nature of wind power, probabilistic forecasts become more important for decision-making in power system operations under large uncertainties [6].

In providing probabilistic forecasts, prediction uncertainties should be completely recognized. In providing wind power forecasts, two major uncertainty sources need to be considered. The first is the uncertainties in predicting future wind speed, whereas the second uncertainty arises when the wind speed is converted to the wind power. Such wind-to-power relationship is called power curve in wind industry. Figure 1 illustrates the impact of both uncertainties in wind speed forecast and conversion process on the probabilistic wind power prediction. Due to the nonlinearity of power curves, the predictive wind speed distribution is not linearly translated into the probabilistic characteristics of wind power prediction. Such nonlinearity causes challenges in quantifying uncertainties in wind power predictions.

To address the aforementioned challenge, this study devises a new integrative methodology where the whole predictive wind speed density is translated into the predictive power density forecast. Specifically, we formulate the wind speed as a continuous stochastic process based on the inhomogeneous geometric Brownian motion (GBM). The inhomogeneous GBM is flexible in capturing nonstationary and highly volatile wind characteristics. We dynamically update the time-varying parameters in the inhomogeneous GBM model with the dual Kalman filtering in order to characterize the nonstationary nature of wind speed. Then, by applying the Ito’s lemma

[10]

to the stochastic power curve, we translate the predicted wind condition to the predictive distribution of wind power. The resulting predictive wind power density takes a closed-form, so it provides comprehensive characterization of prediction uncertainties, including predictive interval, median and quantiles.

The resulting closed-form density enables us to flexibly assign different weights on overestimating and underestimating future generation. Some wind farm operators want to avoid penalties due to unsatisfied demand (or unsatisfied commitment) and thus, prefer underestimation to overestimation of future wind power outputs, while others may prefer overestimation to prevent salvage of excessively generated power [11, 12]. We formulate the optimization problem to obtain the point prediction that can minimize the expected prediction cost caused by possible over/underestimation, according to the operator’s preference.

We apply the proposed approach to four datasets collected from actual operating wind farms for short-term predictions (1 min to 10 min ahead). Our implementation results indicate that the proposed approach can successfully characterize the stochastic wind power process and provide better prediction results in accordance with the wind farm operator’s preference, compared to other alternative methods.

The remainder of this paper is organized as follows. Section II reviews relevant studies. Section III discusses the details of proposed approach. Section IV shows the computational results on real datasets. Finally, we summarize the paper in Section V.

## Ii Literature Review

In general wind speed prediction models employ either physics-based numerical approaches or data-driven approaches. Physics-based approaches use physical descriptions of the mechanisms of wind flow. One of the most popular models in this approach is the numerical weather prediction (NWP) model that simulates the atmospheric processes [6, 8, 9]. Such physics-based approach is known to be useful for medium-range weather forecasting, ranging from hours to days. On the other hand, thanks to the fast-increasing computational capabilities and data storage capacity, data-driven prediction models get much attention recently, and they have been employed for shorter term predictions. Typical time-series models such as the auto-regressive moving average (ARMA) method have been widely used to account for temporal correlation patterns [13, 14]

. Auto-Regressive Generalized Autoregressive with Conditional Heteroscedasticity (AR-GARCH) model, which allows the variance to vary over time, further characterizes the nonstationary nature of wind conditions

[15]. Persistent model, which is the simplest point forecast model, uses the observation in the previous speed for forecasting the next speed. Despite its simplicity, persistent model appears to provide strong prediction accuracy in some wind sites [11].

To forecast future wind power generation, the predicted wind speed should be converted to the wind power prediction through the power curve. Studies in the literature estimate the power curve using various methods such as polynomial regression, splines and nonparametric models, neural-networks and support vector machines

[6, 16, 17, 18]. Once the power curve is constructed, future wind power outputs are typically predicted by plugging the wind speed forecast to the power curve function. These studies aim to provide point wind power forecast.

Some recent studies provide probabilistic forecasts. One approach is to simulate wind speeds from the predictive density of wind speed and convert the sampled wind speed to the power output using the power curve. For example, in [9] ensemble forecasts that integrate predictions generated from multiple physics-based forecast models with different scenarios are used for providing wind speed density forecast. Although this approach considers the uncertainties in predicting the wind speed, probabilistic characteristics and uncertainties in converting the wind speed to wind power are not addressed. Furthermore, as discussed in Section I, due to the nonlinearity in the wind-to-power conversion process, this approach does not provide the predictive wind power distribution in a closed-form.

Another school of thought takes wind speed forecast and historical wind condition as covariates (or inputs) to estimate the probabilistic characteristics of wind power. Based on neural networks, Sideratos and Hatziargyriou [6] estimate quantiles of future wind power, whereas prediction intervals of wind power generation are constructed in [19]. In these studies, the whole predictive wind speed density is not used as input. Rather, point wind speed forecasts and/or historical wind speeds and power are typically included as covariates in the model. Therefore, prediction uncertainties of wind speeds are not fully considered in these studies.

This study fills the knowledge gaps in the literature by collectively accounting for the uncertainties arising in both wind speed prediction and stochastic power conversion process. The proposed method generates predictive density of wind power in a closed form so that diverse information can be extracted for probabilistic prediction of wind power generation.

## Iii Methodology

We first model the dynamics of wind speed process in Section III-A. Then the conversion of wind speed process to the wind power process is discussed in Section III-B. We then provide an optimization framework to forecast future wind power output based on wind farm operator’s preference on over- and underestimation in Section III-C and discuss the implementation procedure in Section III-D.

### Iii-a Modeling Wind Speed Process

Wind speed can be viewed as stochastic processes in a time domain. The inhomogeneous GBM model has been employed to capture the highly volatile stochastic processes [20]. Considering the highly volatile and time-varying wind characteristics, we characterize the dynamics of wind speed using the inhomogeneous GBM model. Let denote the true wind speed at time . We model the stochastic process of as

 dS(t)=μS(t)S(t)dt+σS(t)S(t)dWS(t), (1)

where and capture the drift and volatility in the stochastic process, respectively, and both are time-dependent. denotes a standard Brownian process, where its increment,

, is assumed to be independently and normally distributed with mean 0 and variance

.

Let denote , i.e., . Given the underlying dynamics of in (1), the dynamics of can be represented as

 d[X(t)] =[μS(t)−12σ2S(t)]dt+σS(t)dW(t). (2)

We include the derivation of in Appendix A.

In general, the stochastic differential equation (SDE) in (2) does not have an analytic solution. However, advanced numerical methods use discretization to convert SDE to a stochastic difference equation. Specifically, by applying the Wagner-Platen expansion and the Euler discretization scheme [21] to (2), we obtain

 X(t+Δt)=X(t)+[μS(t)−12σ2S(t)]Δt+σS(t)ΔW(t). (3)

Then it immediately follows that in (3) follows a normal distribution as

 X(t+Δt)∼N(X(t)+[μS(t)−12σ2S(t)]Δt,σ2S(t)Δt). (4)

In other words, wind speed is log-normally distributed as

 ln(S(t+Δt)) ∼N(ln(S(t))+[μS(t)−12σ2S(t)]Δt,σ2S(t)Δt). (5)

Note that the wind speed distribution in (5) characterizes the stochastic dynamics of wind speed through the time-varying parameters, and . To estimate and , one should use wind measurements collected from a meteorological tower or turbine anemometers. However, the collected wind speed may have measurement errors and/or can be perturbed by disturbances such as wake effects [18]. Therefore, the true wind speed is unobservable in practice. To incorporate such errors and disturbances, we assume that the measured wind speed is a linear function of the unobserved true wind speed. Let denote the measured wind speed at time and let . We let a state variable, which is assumed to be perturbed by a normally distributed error term as follows.

 Y(t)=X(t)+z. (6)

Note that the dynamics of , governed by the linear SDE representation in (3), can be rewritten as

 X(t+Δt)=X(t)+Aθ(t)+w(t), (7)

where , , and is the process noise.

The equations in (6) and (7) together represent the linear state space model. Among several ways to estimate the model parameters in the linear state space model, we employ the Kalman filter due to its flexibility and strong performance in many applications [22, 23]. The Kalman filter is a sequential algorithm for estimating and refining parameters and updating the system state recursively, using the previous estimate and new input data. In particular, we use the dual Kalman filtering to estimate parameter vector and state [24]. To model the time-varying parameter , we assume that it drifts according to a 2-dimensional Gaussian random walk process with covariance , i.e.,

 θ(t+Δt)=θ(t)+ϵ, (8)

where . We include the detailed procedure to update the parameters and state in Appendix B.

### Iii-B Modeling Wind-to-Power Conversion Process

This section discusses how to convert the wind speed dynamics obtained in the previous section into the dynamics of wind power process. The relationship between the wind speed and the wind power generation can be quantified by the power curve function. Let denote the power curve at time given the wind speed . Here, can represent the power curve from a whole wind farm or a stand-alone wind turbine.

Note that we model the power curve function, , as a function of (as well as ) to incorporate the time-varying feature of power generation efficiency. This is because, in addition to the wind speed, the wind power output depends on many other environmental factors such as wind direction, humidity, and ambient temperature [17]. Moreover, turbines’ age and degradation states of their components (e.g., blade, gearbox) also affect the generation efficiency. Including all of these additional factors, if not impossible, would make the power curve model overly complicated, and more importantly, it also needs to characterize the dynamics of each factor, as we did for wind speed in Section III-A. Instead, we consider the power curve as a function of wind speed only and let the power curve function itself time-varying. However, our approach in modeling the power curve is flexible enough to employ a time-invariant power curve that only depends on wind speed; in this case, the power curve function can be simply reduced to .

In modeling , any type of functions, e.g., parametric, semi-parametric such as splines [25], or nonparametric function [26, 27], can be employed as long as satisfies some weak conditions. Suppose that the power curve function is differentiable over and and twice differentiable over . The power output at time is given by

 P(t)=F(t,S(t))+e(t), (9)

where denotes a random noise in the wind-to-power conversion process. We assume that follows the normal distribution with mean and variance , where represents the first derivative of over . Here we include in modeling the noise variance, because the power conversion variability tends to be high when the power curve changes rapidly, which is mostly in the mid-speed range. For notational brevity, we will use as an abbreviation of in the subsequent discussion.

We first model the dynamics of the wind power process with any power curve function . Later we will derive the dynamics with specific form for to illustrate the approach.

#### Iii-B1 Dynamics of Wind Power Process with General Power Curve Function

Given the wind speed process modelled in (1), the wind power process also follows the inhomogeneous GBM and its dynamics is modeled by

 dP(t) =μP(t)P(t)dt+σP(t)P(t)dWP(t) (10)

with

 μP(t) =Ft+μSSFS+12σ2SS2FSSP(t), (11) σP(t) =√σ2SS2F2S+σ2FFSP(t), (12)

where denotes a standard Brownian process, represents the first derivative of over , and is the second derivative of over . Also, , , and denote , , and in (1), respectively. We include the detailed derivation of (10)-(12) in Appendix C.

It should be noted that the parameters and in (11) and (12), respectively, depend on the parameters in (i.e., , ) and the power curve related functions (i.e., ). This result indicates that the stochastic dynamics of wind speed , together with the power curve function, is translated into the dynamics of power generation .

Following the similar procedure in (1)-(5), we can derive a distribution of wind power in a closed-form. Specifically, the power output at time is log-normally distributed as

 ln(P(t+Δt)) ∼N(ln(P(t))+[μP(t)−12σ2P(t)]Δt,σ2P(t)Δt). (13)

#### Iii-B2 Dynamics of Wind Power Process with Nonparametric Power Curve Function

As discussed earlier, the power curve can be flexibly modeled using various functional forms. To illustrate, we employ the nonparametric adaptive learning [27] in our analysis. We explain only an outline of the nonparametric adaptive learning method here. For more detailed procedure, the reader is referred to [27].

In the nonparametric approach the input is mapped into a feature space through a nonlinear mapping . Then can be modeled by

 P(t)=F(t,S(t))+e(t)=ωTtϕ(S(t))+e(t), (14)

where is a nonparametric regression coefficient vector at period .

The coefficient vector is time-varying, so that the power curve can be updated whenever a new sample is observed. Suppose that was estimated by at time and we obtain newly observed data at time . Then we estimate by solving the following optimization problem.

 minL =12∥ωt−^ωt−Δt∥2+12γe(t)2 (15) s.t. P(t)=ωTtϕ(S(t))+e(t). (16)

Here the first term in the objective function represents the change of the coefficient from to . The second term regularizes the amount of update with the regularization parameter , balancing the coefficient change and quality of model fitting. For more details, please refer to [27].

Let denote the inner product of and , i.e., called a kernel function. Suppose there are observations up to time . Then is updated by

 ^F(t,S(t))=n∑i=1λik(S(t),S(t−(n−i)Δt)), (17)

where is Lagrange multiplier corresponding to the equality constraint in (16). Among many choices of the kernel function, we employ the following Gaussian kernel due to its flexibility,

 k(S(ti),S(tj))=exp(−(S(ti)−S(tj))22δ) (18)

with positive constant .

Then the estimated power curve, in (17), can be plugged into the predictive distribution for in (III-B1). Specifically, to estimate and in (11) and (12), respectively, we need to estimate , , and . First, can be estimated by taking the finite difference as

 ^Ft=∂F∂t=^F(t,S(t))−^F(t−Δt,S(t))Δt=λtk(S(t),S(t))Δt. (19)

Next, and , which are partial derivatives of over , are estimated by

 ^FS =∂F∂S=n∑i=1λi∂k(S(t),S(t−(n−i)Δt))∂S(t) =n∑i=1λik(S(t),S(iΔt))(−S(t)−S(t−(n−i)Δt)δ), (20)
 ^FSS =∂2F∂S2=n∑i=1λi∂2k(S(t),S(t−(n−i)Δt))∂S2(t) =n∑i=1λik(S(t),S(t−(n−i)Δt))⋅ (21) ((S(t)−S(t−(n−i)Δt))2δ2−1δ). (22)

Finally, we need to estimate in

. We use the sample standard deviation to get its estimate, by using the first

data points

 ^σF=    ⎷1N0−2N0∑i=2⎛⎜ ⎜⎝ΔP(iΔt)−Δ^F(iΔt,S(iΔt))√^FS(iΔt,S(iΔt))Δt⎞⎟ ⎟⎠2, (23)

where

 ΔP(iΔt)=P(iΔt)−P((i−1)Δt) (24) Δ^F(iΔt,S(iΔt)) =^F(iΔt,S(iΔt))−^F((i−1)Δt,S((i−1)Δt)) (25)

By plugging the estimated parameters, , , and into (17)-(23) to and in (11) and (12), we obtain the predictive distribution of power at in (III-B1). Recall that other parameters associated with wind speed dynamics, i.e., and , are estimated from the dual Kalman filtering process discussed in Section III-A.

This section presents the procedure for estimating parameters in the power output density, when is modeled by the nonparametric function. Similar analysis can be performed when other functional forms is used for modeling .

### Iii-C Uncertainty Quantification and Wind Power Prediction

The closed-form predictive distribution of wind power output in (III-B1) provides comprehensive information to characterize prediction uncertainties such as the prediction interval and quantiles. First, following the procedure presented in [28], we obtain the prediction interval for the power generation at time by

 [exp(μ′+σ′A),exp(μ′+σ′B)] (26)

where and , and and are the solution of

 {Φ(B)−Φ(A)=1−α,A+B=−2σ′. (27)

Here

denotes the cumulative distribution function of a standard normal distribution.

Next we can obtain the -quantile such that as

 Qβ=exp(μ′+σ′Φ−1(β)). (28)

In particular the median of is given by for .

Such quantile information can be used for determining the prediction value. In the time series modeling and analysis, quantities that represent a central tendency, e.g., mean or median, are typically used as a point predicted value. However, in wind power operations, the costs for underestimation and overestimation could be different [11]. The quantile can be used to flexibly estimate the power by penalizing under/overestimations differently. Let denote the predicted power output at time . Let

is the probability density function (pdf) of the log-normal distribution described in (

III-B1) of the power output at . The expected amount of underestimation and overestimation, denoted by and , respectively, are given by

 u(p;t+Δt) =EP(t+Δt)[max{0,P(t+Δt)−p}] =∫+∞pxf(x)dx, (29) o(p;t+Δt) =EP(t+Δt)[max{0,p−P(t+Δt)}] =∫p−∞xf(x)dx, (30)

respectively.

Intuitively we would like to predict the power output that can minimize the expected cost due to possible under/overestimations. Therefore, the optimal , denoted by , can be obtained by solving the following unconstrained optimization problem.

 p∗=argminp(αu(p;t+Δt)+(1−α)o(pK;t+Δt)) (31)

where represents the penalty to the underestimation. When the underestimation (overestimation) is more costly, greater (less than) than 0.5 can be used.

Then it is straightforward to show that the optimal solution of (31) is the percentile of the density of in (III-B1) [11]. In other word, the solution of (31) is given by in (28).

Figure 2 depicts the outline of the proposed methodology.

### Iii-D Algorithm Summary

We summarize the implementation algorithm to make the one-step forward prediction of the wind farm power output; see Algorithm 1 below. For the one-step ahead prediction, we set . We use the first data points to initialize model parameters. Then, from , we predict the power at the next time step and update (or filter) the parameters when a new observation is updated.

Note that in the initialization step, we initialize the parameters and power curve function. To get and , we apply the cross validation technique to the data points and choose the values that minimize the prediction error [29]. Also, considering that is normally distributed as shown in (9), we use the sample mean and sample standard deviation of the measured wind speeds to initialize and (see the lines #6-#8 in the algorithm). In the prediction step (lines #11-#14), , and denote the prior estimates of , and , respectively, from the Kalman filtering, whereas, in the filtering step (lines #15 and #18), , and correspond to their posterior estimates after observing wind speed and power at time ; detailed prediction and filtering procedures are included in Appendix B.

## Iv Case Studies

We apply the proposed approach to real datasets collected from four operating wind farms. Table I summarizes the information of four wind farms, WF1-WF4. Due to the data confidentiality required from data providers, detailed information regarding the wind farms are omitted. Each dataset includes wind measurements and power outputs from the whole wind farm. In all wind farms, the power outputs are scaled to .

We divide each wind farm dataset into training and testing sets. The training set includes observations in the first 70% samples of the whole dataset. The parameters and in the wind speed process, the error parameters and in the Kalman filtering, and the power curve are initialized using the observations in the training set. The testing set contains the remaining 30% samples and is used for evaluating the prediction performance in the one-step ahead wind farm output prediction, i.e., 1 minute ahead prediction for WF1 and 10 minute ahead prediction for other farms.

### Iv-a Implementation Results

Figure 3 presents the prediction results in the testing set in WF1 with three different values, depending on the wind farm operator’s preference. In [11] is considered where the underestimation is more penalized than the underestimation. We also consider its apposite case with , as well as the case with where overestimating and underestimating are equally penalized. Our prediction results are close to the real power outputs with all three values, among which provides the closet prediction to the real value in this case. With , our predictions are generally higher (lower) than real observation, as we originally intended. We observe similar patterns in other wind farms.

Figure 4 depicts and prediction intervals in each dataset. The majority of the observations fall in the prediction intervals, indicating that our approach can successfully capture the uncertainties in all datasets. We can also observe that in general the more volatile the power output (i.e., when the power output changes rapidly), the wider the prediction intervals. For example, when is in WF4, the power output increases rapidly and the prediction interval gets wider, which represents a larger prediction uncertainty. Similar phenomenon can be observed when is about 950 in WF2. There, the power output increases and decreases rapidly, and the prediction interval becomes wider.

### Iv-B Comparison with Alternative Methods

We compare our approach with alternative methods. Specifically, we consider three time series methods for predicting wind speeds: persistent model, the ARMA model and the Auto-Regressive GARCH (AR-GARCH) model. A typical approach for predicting wind power is to predict the future wind speed and apply the power curve with the predicted wind speed. Therefore, in the alternative methods, we first predict the wind speed at time as and plug the predicted wind speed into the power curve to get . In all three methods, we apply the same non-parametric power curve discussed in Section III-B.

In the persistent model, the current wind speed is used to predict the speed at the next time step, i.e. . In both ARMA and AR-GARCH methods, the wind speed is assumed to follow a normal distribution. We use built-in functions in Matlab to implement both ARMA and AR-GARCH model, and decide the orders by minimizing the Bayesian information criterion (BIC). We update the model order and parameters in ARMA and AR-GARCH whenever a new observation is obtained.

We first evaluate the prediction performance with unequal penalties on the overestimation and underestimation. In the proposed approach we use the -quantile of the predictive power output density as discussed in Section III. For fair comparison, in ARMA and AR-GARCH, we also use the -quantile of their predictive wind speed densities and plug the resulting -quantile estimates to the power curve [11]. Note that the forecast values do not change with different values in the persistent method.

To measure the prediction quality with unequal penalties, Hering and Genton [12] proposed the power curve error (PCE), defined as

 PCE(P(t),^P(t))={α(P(t)−^P(t)),%if^P(t)

where is the observed power at time and is its predicted power from each method.

Table II summarizes the average PCE from each method for three values. Figure 5 further shows the average PCE over . The AR-GARCH generates lower PCEs than ARMA, because it takes time-varying variance of wind speed into consideration. But PCEs from AR-GARCH are still higher than the proposed approach in all datasets. Our approach consistently produces the lowest PCEs in all cases, indicating that our approach is superior in reflecting wind farm operators’ prediction preference on overestimation and underestimation.

We also evaluate the prediction performance using the median in each method, as the median represents the central tendency in predictions. In ARMA and AR-GARCH, the median in the predictive wind speed density is the same as mean. In the proposed approach we use in (31). We use the root-mean-square error (RMSE) and mean absolute error (MAE), defined as and , where and is the total number of observations in the testing set. The RMSEs and MAEs are summarized in Table III. Our approach generates lowest prediction errors in terms of both RMSE and MAE in all cases. This result demonstrates that our approach can provide strong prediction capability in the highly volatile wind power process.

## V Summary

We present a new integrative methodology for predicting wind power under the assumption that the underlying dynamics of wind speed can be represented by the inhomogeneous GBM. The nonstationary characteristics in wind power generation are fully captured through time-varying parameters in the wind speed model and power curve function. The proposed approach captures uncertainties in wind speed process and wind-to-power conversion process and provides rich information for the probabilistic forecast through its closed-form prediction density. The closed-form density allows us to extract diverse information, e.g, prediction interval and quantile, and to determine forecast, depending on the wind farm operator’s preference on the overestimation and underestimation of future wind power outputs. This framework can minimize the overall costs associated with prediction errors. The implementation results demonstrate that our method provides strong prediction capability, achieving lower prediction errors than alternative approaches.

We believe our approach could potentially benefit power grid operations. In the future we plan to incorporate our prediction results into the optimization framework for solving decision-making problems such as economic dispatch. We also plan to apply the approach to predict the mechanical and structural load responses in the wind turbine system for the reliability analysis and maintenance optimization [30, 31]. The proposed methodology is also applicable to other engineering systems subject to nonstationary operating conditions, such as solar power systems [32].

## References

• [1] U.S. Energy Information Administration. (2018) Wind expected to surpass hydro as largest renewable electricity generation source. [Online]. Available: https://www.eia.gov/todayinenergy/detail.php?id=34652
• [2] P. Beiter, K. Haas, and S. Buchanan, “2014 renewable energy data book,” National Renewable Energy Laboratory, Washington DC, Technical Report, 2015. [Online]. Available: http://www.nrel.gov/docs/fy16osti/64720.pdf
• [3] L. Hirth, “The market value of variable renewables: The effect of solar wind power variability on their relative price,” Energy Econ., vol. 38, pp. 218–236, Jul. 2013.
• [4] F. Teng and G. Strbac, “Full stochastic scheduling for low-carbon electricity systems,” IEEE Trans. Autom. Sci. and Eng., vol. 14, no. 2, pp. 461–470, Apr. 2017.
• [5] F. Bouffard and F. D. Galiana, “Stochastic security for operations planning with significant wind power generation,” IEEE Trans. Power Syst., vol. 23, no. 2, pp. 306 – 316, May 2008.
• [6]

G. Sideratos and N. D. Hatziargyriou, “Probabilistic wind power forecasting using radial basis function neural networks,”

IEEE Trans. Power Syst., vol. 27, no. 4, pp. 1788–1796, Nov. 2012.
• [7] Q. Kang, M. Zhou, J. An, and Q. Wu, “Swarm intelligence approaches to optimal power flow problem with distributed generator failures in power networks,” IEEE Trans. Autom. Sci. and Eng., vol. 10, no. 2, pp. 343–353, Apr. 2013.
• [8]

J. Jeon and J. W. Taylor, “Using conditional kernel density estimation for wind power density forecasting,”

J. Am. Stat. Assoc., vol. 107, no. 497, pp. 66–79, Jan. 2012.
• [9] J. W. Taylor, P. E. McSharry, and R. Buizza, “Wind power density forecasting using ensemble predictions and time series models,” IEEE Trans. Energy Convers., vol. 24, no. 3, pp. 775–782, Sep. 2009.
• [10] T. Bjork, Arbitrage Theory in Continuous Time.   Oxford University Press, 2009.
• [11] A. Pourhabib, J. Z. Huang, and Y. Ding, “Short-term wind speed forecast using measurements from multiple turbines in a wind farm,” Technometrics, vol. 58, no. 1, pp. 138–147, Feb. 2016.
• [12] A. S. Hering and M. G. Genton, “Powering up with space-time wind forecasting,” Journal of the American Statistical Association, vol. 105, no. 489, pp. 92–104, 2010.
• [13] E. Erdem and J. Shi, “ARMA based approaches for forecasting the tuple of wind speed and direction,” Appl. Energy, vol. 88, no. 4, pp. 1405–1414, Apr. 2011.
• [14]

P. Pinson, “Very-short-term probabilistic forecasting of wind power with generalized logit–normal distributions,”

J. Royal Stat. Soc. C-Appl., vol. 61, no. 4, pp. 555–576, Feb. 2012.
• [15] Y. Zhang, J. Wang, and X. Wang, “Review on probabilistic forecasting of wind power generation,” Renew. Sust. Energ. Rev., vol. 32, pp. 255 – 270, Apr. 2014.
• [16] N. Yampikulsakul, E. Byon, S. Huang, S. Shawn, and M. You, “Condition monitoring of wind turbine system with nonparametric regression-based analysis,” IEEE Trans. Energy Convers., vol. 29, no. 2, pp. 288–299, Jun. 2014.
• [17] G. Lee, Y. Ding, M. G. Genton, and L. Xie, “Power curve estimation with multivariate environmental factors for inland and offshore wind farms,” J. Am. Stat. Assoc., vol. 110, no. 509, pp. 56–67, Oct. 2015.
• [18] M. You, E. Byon, J. J. Jin, and G. Lee, “When wind travels through turbines: A new statistical approach for characterizing heterogeneous wake effects in multi-turbine wind farms,” IISE Trans., vol. 49, no. 1, pp. 84–95, 2017.
• [19] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, and K. P. Wong, “Optimal prediction intervals of wind power generation,” IEEE Trans. Power Syst., vol. 29, no. 3, pp. 1166–1174, May 2014.
• [20] B. Øksendal, Stochastic differential equations.   Springer, 2003.
• [21] E. Platen, “An introduction to numerical methods for stochastic differential equations,” Acta Numer., vol. 8, pp. 197–246, Jan. 1999.
• [22] S. Haykin, Kalman filtering and neural networks.   John Wiley & Sons, 2004, vol. 47.
• [23] B. Sun, P. B. Luh, Q.-S. Jia, Z. O’Neill, and F. Song, “Building energy doctors: An spc and kalman filter-based method for system-level fault detection in hvac systems,” IEEE Trans. Autom. Sci. and Eng., vol. 11, no. 1, pp. 215–229, Jan. 2014.
• [24] E. A. Wan and A. T. Nelson, “Dual kalman filtering methods for nonlinear prediction, smoothing and estimation,” in Adv. Neural Inf. Process Syst., 1997, pp. 793–799.
• [25] G. Lee, E. Byon, L. Ntaimo, and Y. Ding, “Bayesian spline method for assessing extreme loads on wind turbines,” Ann. Appl. Stat., vol. 7, no. 4, p. 2034–2061, 2013.
• [26] G. Lee, Y. Ding, L. Xie, and M. G. Genton, “A kernel plus method for quantifying wind turbine performance upgrades,” Wind Energy, vol. 18, no. 7, pp. 1207–1219, 2015.
• [27] E. Byon, Y. Choe, and N. Yampikulsakul, “Adaptive learning in time-variant processes with application to wind power systems,” IEEE Trans. Autom. Sci. Eng, vol. 13, no. 2, pp. 997–1007, Apr. 2016.
• [28] R. C. Dahiya and I. Guttman, “Shortest confidence and prediction intervals for the log-normal,” Can. J. Stat., vol. 10, no. 4, pp. 277–291, Dec. 1982.
• [29] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning: data mining, inference, and prediction, ser. Springer Series in Statistics.   Springer, 2nd ed., 2009.
• [30] Y. M. Ko and E. Byon, “Condition-based joint maintenance optimization for a large-scale system with homogeneous units,” IISE Trans., vol. 49, no. 5, pp. 493–504, 2017.
• [31] K. Liu and S. Huang, “Integration of data fusion methodology and degradation modeling process to improve prognostics,” IEEE Trans. Autom. Sci. and Eng., vol. 13, no. 1, pp. 344–354, 2016.
• [32] Y. Choe, W. Guo, E. Byon, J. Jin, and J. Li, “Change-point detection on solar panel performance using thresholded lasso,” Qual. Reliab. Eng. Int., vol. 32, no. 8, pp. 2653–2665.

## Appendix A Derivation of dX(t) in (2)

Let with . We use Ito’s Lemma [10, chap. 4] to derive in (2) as follows.

 dX(t)=df(t,S(t)) =∂f∂tdt+∂f∂SdS(t)+12∂2f∂S2dS(t)2 (33) ={∂f∂t+μ(t)∂f∂S(t)+12σ(t)2∂2f∂S(t)2}dt +σ(t)∂f∂S(t)dW(t), (34)

where in (34) the dynamics of , i.e., with and , is used. Also we set and to zero because they approach zero faster than and substitute for .

By plugging derivatives of over and , and to (34), we get

 dX(t) ={0+μS(t)S(t)⋅1S(t)−12(σS(t)S(t))21S(t)2}dt +σS(t)S(t)1S(t)dW(t) (35) =[μS(t)−12σ2S(t)]dt+σS(t)dW(t). (36)

## Appendix B Dual Kalman Filtering Procedure

Recall that the parameter vector is and state is . We use for . Let and denote the posterior and prior estimates of state variable with their associated estimation error variances and , respectively. Similarly, and , respectively, denote the posterior and prior estimates of the parameter vector and and represent the corresponding estimation error covariance matrices. We let and denote the Kalman gain associated with state and parameters filters at time , respectively. Then the dual Kalman filtering proceeds as follows:

• Parameters prediction:

 ^θ(t+Δt∣t) =^θ(t∣t), (37) Pθ(t+Δt∣t) =Pθ(t∣t)+Q. (38)
• State prediction:

 ^X(t+Δt∣t) =^X(t∣t)+A^θ(t+Δt∣t), (39) PX(t+Δt∣t) =PX(t∣t)+Δt^θ2(t+Δt∣t). (40)
• State filtering:

 KX(t+Δt) =PX(t+Δt∣t)[PX(t+Δt∣t)+σ2z]−1, (41) ^X(t+Δt∣ t+Δt)=^X(t+Δt∣t) +KX(t+Δt)[Y(t+Δt)−^X(t+Δt∣t)], (42) PX(t+Δt∣ t+Δt)=[I−KX(t+Δt)]PX(t+Δt∣t). (43)
• Parameters filtering:

 Kθ(t+Δt) = Pθ(t+Δt∣t)AT[APθ(t+Δt∣t)AT+σ2z]−1, (44) ^θ(t+Δt∣ t+Δt)=^θ(t+Δt∣t) (45) +Kθ(t+Δt)[Y(t+Δt)−^X(t+Δt∣t)], (46) Pθ(t+Δt∣ t+Δt)=[I−Kθ(t+Δt)A]Pθ(t+Δt∣t). (47)

Then , which is the posterior estimate of , is used to estimate and similarly, for estimating and in (5).

## Appendix C Derivation of dP(t) in (10):

We use the procedure similar to (33)-(36) and the dynamic of , with and . Based on Ito’s Lemma [10, chap. 4], we obtain

 dF(t,S(t)) ={∂F∂t+μ(t)∂F∂S(t)+12σ(t)2∂2F∂S(t)2}dt +σ(t)∂F∂S(t)dW(t) ={Ft+μS(t)S(t)FS+12(σS(t)S(t))2FSS}dt +σS(t)S(t)FSdW(t) =Ft+μS(t)S(t)FS+12σS(t)2S(t)2FSSP(t)P(t)dt +σS(t)S(t)FSP(t)P(t)dW(t)

We note that during time to , the jump value is

 ΔP(t) =P(t+Δt)−P(t) =F(t+Δt,S(t+Δt))−F(t,S(t)) +e(t+Δt)−e(t) =ΔF(t,S(t))+Δet,

where is assumed to follow the normal distribution with mean and variance , i.e., . Or equivalently,

 de(t)=σF√FS(t,S(t))dWe(t),

where denotes a standard Brownian process.

Therefore, the dynamic of becomes

 dP(t) =dF(t,S(t))+de(t) =Ft+μS(t)S(t)FS+12σ2S(t)S(t)2FSSP(t)P(t)dt +σS(t)S(t)FSP(t)P(t)dW(t)+σF√FS(t,S(t))dWe(t),

where and are two independent Brownian motions, which leads to

 dP(t) =Ft+μS(t)S(t)FS+12σS(t)2S(t)2FSSP(t)P(t)dt =μP(t,P)P(t)dt+σP(t,P)P(t)dWP(t),

where

 μP(t,P) =Ft+μS(t)S(t)FS+12σ2S(t)S(t)2FS