1 Introduction
Despite the Covid19 pandemic causing “the biggest shock to the global energy system in more than seven decades” the recent report released on April 30, 2020, by the International Energy Agency (https://www.iea.org/reports/globalenergyreview2020) stresses that projections of energy demand will fall 6% in 2020 – seven times the decline after the 2008 global financial crisis – It is emphasized that “renewables are set to be the only energy source that will grow in 2020, with their share of global electricity generation projected to jump thanks to their priority access to grids and low operating cost” ([IEA]).
Although the growth of renewables in electricity generation in 2020 is set to be lower than in previous years, solar photovoltaic (PV) and wind are on track to help lift renewable electricity generation by 5% in 2020, aided by higher output from hydropower.
Wind and solar energy are expanding renewable generation capacity, experiencing record growth in the last years.
Reliable wind power generation forecasting is crucial for the following applications (see, for example, [gieb, 5], [chang, 162], [zhbo]):

Allocation of energy reserves such as water levels in dams or oil, and gas reserves.

Operation scheduling of controllable power plants.

Optimization of the price of electricity for different parties such as electric utilities, Transmission system operator (TSOs), Electricity service providers (ESPs), Independent power producers (IPPs), and energy traders.

Maintenance planning such as that of power plants components and transmission lines.
Different methods have been applied to wind power forecasting. They can be generally categorized as follows: physical models, statistical methods, artificial intelligence methods, and hybrid approaches. The output of such methods is usually a deterministic forecast. Occasionally probabilistic forecasts are produced through uncertainty propagation in the data, parameters, or forecast ensembles. However, there is a lack of simulating and producing datadriven stochastic forecasts based on forecasting models. It is crucial to capture the forecast’s actual performance as it has been known that different forecasting technologies exhibit different behavior for different wind farms and seasons (
[chang]). This is due to many factors that forecast are challenged to capture, such as the surrounding terrains of the wind farm and the condition of the blades such as icing, wear and tear, or dirt. It is known that complex terrains in both off shore and on shore locations decrease the accuracy of wind power forecasts significantly ([schicker2017short]). It also has been shown that the performance of forecasts varies from month to month. Thus the performance of wind power forecasts is location and time dependent.Many approaches have been taken to evaluate the uncertainty of a given forecast. There are two types of errors: level errors and phase errors. The use of mean or median errors in this context may be misleading as wind power forecast errors are asymmetric. This is a natural consequence of wind power being nonnegative and bounded by the maximum capacity of production. This is important as the associated cost to power forecast errors is also asymmetric due to different costs for up and down power regulations, which are determined by the electricity market ([tsitsiklis2015pricing]).
We propose to model wind power forecast errors using parametric stochastic differential equations (SDEs) whose solution defines a stochastic process. This resultant stochastic process describes the time evolution dynamics of wind power forecast errors while capturing properties such as a correlation structure and the inherent asymmetry. Additionally, the model we propose is agnostic of the forecasting technology and serves to complement forecasting procedures by providing a datadriven stochastic forecast. Hence, we can evaluate wind power forecasts according to their realworld performance, and we can compare different forecasting technologies. Most notably, we can simulate future wind power production given a deterministic wind power forecast. Future wind power production using Monte Carlo methods, as well as the analytic form of the proposed SDE, can be used in optimal control problems involving wind power production.
Some interesting works have been devoted to probabilistic forecasting related to renewable energies based on stochastic differential equations, among them ([mozuma]) on wind power forecast, ([immm]) and ([bggk]) on forecasts of solar irradiance.
Here, we propose an improved model featuring time derivative tracking of the forecast, timedependent mean reversion, modified diffusion, and nonGaussian approximations. We apply the model to Uruguayan wind power forecasts together with historical wind power production data pertaining to the year 2019.
The rest of the paper is organized as follows. In Section 2, we introduce the main characteristics of a real data set encompassing the normalized wind power production in Uruguay between April and December 2019, with the most accurate predictions, as highlighted in our posterior analysis, performed by one out of the three sources of forecast providers. The significant steps for constructing phenomenological models of the normalized wind power production and the forecast error based on stochastic differential equations are described in Section 3. The application of the Lamperti transform with unknown parameters in Section 4 leads to model the forecast error through a stochastic differential equation with a unit diffusion coefficient. In Section 5, we write down the expressions for the likelihood functions of the forecast error in its original space and the Lamperti space. We also derive simple approximations of the likelihood functions. Section 5
concludes with the description of the optimization algorithm to compute approximate maximum likelihood estimates, including the case where we expand the model comprising an initial transition from the time the forecast is performed to the time of the first forecast.
In Section 6, we apply our proposed numerical estimation procedures to the Uruguay wind and forecast dataset, comparing two alternative models and assessing, for the best candidate model, the performance of three different forecast providers. Section 7 concludes the paper. The proofs of the existence, strong uniqueness, and boundedness of the SDE solutions used to model normalized wind power production and its forecast error are given in the Appendix.
2 Wind power production data in Uruguay and forecast providers
In recent years, Uruguay has triggered a remarkable change in its energy matrix. In ([irena], p.23), Uruguay was among those countries showcasing innovation, like Denmark, Ireland, Germany, Portugal, and Spain, with proven feasibility of managing annual variable renewable energy (VRE) higher than 25% in power systems.
According to ([ren21], pp.118–119), in 2018, Uruguay achieved 36% of its electricity production from variable wind energy and solar PV, raising the share of generation from wind energy more than fivefold in just four years, from 6.2% in 2014 to 33% in 2018.
At present, Uruguay is fostering even higher levels of wind penetration by boosting regional power trading with Argentina and Brazil. In this rapidly evolving scenario, it is essential to analyze national data on wind power production with wind power shortterm forecastings to orientate and assess the strategies and decisions of wind energy actors and businesses.
Our study is based on publicly available data (source: Administrator of Electric Market) on the wind power production in Uruguay between April and December 2019, that we adequately normalized with respect to the present maximum installed wind power capacity. Each day, wind power production recordings are available every ten minutes. In this work, we also considered data from three different forecast providers, available each day starting at 1 pm.
The next Figure 2 shows the wind power real production during four segments 24 hours selected from the observation period together with their corresponding hourly shortterm forecast, computed by a forecast provider. For the sake of visualization clarity, this section relies only on forecasts from one provider, called “provider A” from now on, ranked as the most accurate forecast provider, as it emerged from our posterior analysis.
A view of the global discrepancy between the real production and the forecasted production, during the nine months observation period, is summarized through the forecast error histograms in Figure 3, where we also partitioned the forecast errors according to three contiguous categories of normalized generated power. Low normalized generated power corresponds to the range , midpower refers to the range , and highpower to the range .
We may observe that all the histograms in Figure 3
exhibit skewed patterns, to a different extent, as well as extreme observations. The presence of these features can be partly explained. The data analysis highlighted that, during several 24hour segments, the system operators decided to reduce or even cease the wind power production. Indeed, as recalled in (
[irena2], p.8), “Uruguay experiences high curtailment levels because generation exceeds demand.” Despite the large country’s interconnection capacity with Argentina and Brazil, there is no active crossborder market; the energy is traded via ad hoc shortterm agreements. ([irena2], p.3) “Even with interconnection capacity exceeding peak demand, the power system experiences high VRE curtailment, mostly at night when wind generation exceeds demand.”The curtailment of the wind power production imposed by the system operators has a strong influence on the forecast error. To build a model that, driven by the available forecast, allows the inclusion of true power production with a prescribed degree of uncertainty, it is necessary to remove the data segments affected by wind curtailment.
Once we removed all the 24hour segments showing wind curtailment, we set up a dataset containing 147 daily segments. In the absence of the curtailment intervention, the forecast error histograms shown below in Figure 4, can appreciate skewness reduction, except for low power forecast error histogram.
In this stage of data preprocessing, we obtain another useful result by applying the firstorder difference operator to the forecast errors. The forecast error transition histograms, displayed in the next Figure 5, will later constitute a reference for the visual assessment of the global fit of the proposed models.
The histograms in Figure 5 feature a nonGaussianity trait and provide initial input for the modelbuilding stage, which is described in the next section. Later, in Section 4, guided from inferring the unknown model parameters, we will also propose transforming data as a strategy that naturally leads to an alternative model.
3 Phenomenological Model
After analyzing the available dataset, we are now in the condition to build a type of phenomenological model for the normalized wind power generation forecasts that, in its most general form, is a stochastic process defined by the following stochastic differential equation (SDE):
(1) 
where

denotes a drift function,

a diffusion function,

is a vector of unknown parameters,

is a timedependent deterministic function valued and is its time derivative,

is a standard realvalued Wiener process.
In this work, is to be considered a deterministic forecast for the normalized wind power generation, which is provided by Administrator of Electric Market.
Our goal is to achieve a specification of the model (1) to follow the available normalized wind power forecasts closely while ensuring its unbiasedness with respect to the forecast.
3.1 Physical Constraints
Let be the available prediction function for the normalized wind power, which is an input to this approach. To start with the model specification, first, we introduce a timedependent drift function that features the meanreverting property as well as derivative tracking:
(2) 
where is a positive deterministic function, whose range depends on , as will be explained shortly.
Now, looking at the normalized wind power generation forecast process , modeled as solution to the Itô stochastic differential equation (1) with the drift specified in (2), it is straightforward to check that , given . The application of Itô’s lemma leads to
(3) 
Taking expectation on (3) we obtain
(4) 
At this stage, the process defined by (1) with drift (2) satisfies the two following properties:

it reverts to its mean , with a timevarying speed that is proportional to the deviation of the process from its mean,

it tracks the time derivative .
Remark: Observe that a meanreverting model without derivative tracking shows a delayed path behavior. For instance, consider the diffusion model (1) with . In this case, given , the diffusion has mean . The next Figure (6) illustrates how different behave the estimated confidence bands for two diffusion models with and without derivative tracking, fitting the same daily segment.
The forecast and production wind power data of Uruguay are normalized with respect to the installed power capacity during the period of observation. Thus, the meanreverting level lies in , and the process must take values in the same interval, a requirement that is not automatically fulfilled through the derivative tracking. To impose that the state space of is , we may choose a convenient diffusion term, and require that the timevarying parameter satisfies an adhoc condition.
Let , and choose a statedependent diffusion term that avoids the process exiting from the range as follows:
(5) 
where is an unknown parameter that controls the path variability. This diffusion term belongs to the Pearson diffusion family and, in particular, it defines a Jacobi type diffusion. It is useful to recall that ([foso, 440]) a Pearson diffusion is a stationary solution to a stochastic differential equation of the form
(6) 
where , and , , and are parameters such that the square root is well defined when is in the state space. These parameters, together with , the mean of the invariant distribution, determine the state space of the diffusion as well as the shape of the invariant distribution.
An exhaustive classification of the (stationary) Pearson diffusions is presented in ([foso, 440443]) where, in particular, it is discussed the case and
, where the invariant distribution is a Beta distribution with parameters
that leads to the wellknown Jacobi diffusions, socalled because the eigenfunctions of the infinitesimal generator of these processes are the Jacobi polynomials (see, for example,
[leph, 28602861]).It is worth mentioning that Jacobi diffusions have been successfully applied in several disciplines, among them finance (see [vago] and references therein) and neuroscience ([dotala]).
However, a distinctive feature in our proposed model
(7) 
is that the drift term contains the timevarying parameter , rendering the solution of (7) to a nonstationary and timeinhomogeneous process. To ensure that the process is the unique strong solution of (7) for all with state space a.s., the meanreversion timevarying parameter must satisfy the condition:
(B) 
The proof of this theoretical statement is presented in Section 8.
Remark: Condition (B) shows that the timevarying parameter becomes unbounded when or . Therefore, we consider the following truncated prediction function
(8) 
that satisfies for any and , providing that is bounded for every .
Remark: In this work, the analysis of the three different forecast datasets shows that there exists, for any forecast provider, a small to define the truncated prediction function fulfilling the above condition.
From now on, we will keep the notation to denote the truncated prediction function (8), unless specified otherwise.
3.2 A model specification for the forecast error
After applying to (7) the simple change of variables
we may introduce the following model for the forecast error of the normalized wind power production:
(9) 
4 State independent diffusion term: Lamperti transform
Our model (9) for the forecast error has a diffusion term that depends on the state variable . Under the conditions that permit the use of Itô’s formula on a wellchosen transformation of the process , John Lamperti ([lamp]) first showed that the transformed process is again a diffusion process that is solution to a SDE with unit coefficient for the diffusion term. The vast literature nowadays refers to this result as the socalled Lamperti transform (see, for example, [iacus1, 40–41]; [moma]; [pani, 199–203]; [saso, 98–100]), which is a basic tool to obtain a SDE for the transformed process whose diffusion term does not depend anymore on the state variable. A remarkable effect of removing the state dependency from the random noise term is to increase the numerical stability of the simulated paths of the transformed process. For this reason, some estimation methods of the unknown parameters of nonlinear SDE models incorporated the Lamperti’s change of variable as part of a more complex approximation procedure (for example, in the case of onedimensional diffusions, the local linearization method in [shoz], or the expansion method in [ait], later extended to timeinhomogeneous SDEs in [eglix]).
We consider the following Lamperti transform with unknown parameters
(10) 
that, after applying Itô’s formula on , leads to the following SDE with state independent unit diffusion term
(11) 
After replacing in (11), we obtain that the process satisfies the SDE
(12) 
To summarize the effect of the Lamperti transform visually, we can see in the next Figure (7) how the forecast error transition histograms (without curtailment) modify in comparison with Figure (5).
In this case, the Lamperti transform has been applied using the optimal estimates of the parameters in the SDE model (9), obtained applying our numerical procedure detailed later in Subsection 5.4.
The shape of the forecast error transition histograms after applying the Lamperti transform has similarities with the Gaussian distribution, motivating toward the use of Gaussianlike approximations of the unknown density transition functions of the process
.Remark: Moreover, this obtained Gaussian distribution supports the validity of the choice of our model diffusion coefficient given by (6).
5 Likelihood functions of the forecast error data and optimization algorithm
5.1 Likelihood in the space
Suppose that any of nonoverlapping paths of the continuoustime Itô process , each one starting at a different time with , is sampled at equispaced discrete points with given length interval . Let denote this random sample, with .
Let
be the conditional probability density of
given evaluated at , where are the unknown model parameters.The Itô process defined by the SDE (9) is Markovian, and the likelihood function of the sample can be written as the following product of transition densities:
(13) 
where for any and .
Remark: In the last subsection of this section, we will extend the statistical model (13) by adding the transition that occurs during the time interval, say of length
, between the epoch when the forecast is done and the first epoch (1 pm) of each dayahead forecast. To this purpose, the likelihood function (
13) must include for any of the paths an additional factor, say , expressing the conditional density of the early transition. The parameter can be calibrated together or after the estimation of , suggesting an optimal time for the scheduling of the forecasts.The exact computation of the likelihood (13) relies on the availability of a closedform expression for the transition densities of that, on the basis of the Markovian property of , are characterized for , as solutions of the FokkerPlanckKolmogorov equation ([iacus1, 36]; [saso, 6168]):
(14) 
subject to the initial conditions where is the Diracdelta generalized function centered at
However, closedform solutions to initialboundary value problems for timeinhomogeneous diffusions can be obtained only in a few cases (see, for example, [eglix, Section 3.1]). In our case, solving numerically (14) for the transition densities of the process at every transition step is computationally expensive. Several numerical techniques have been devised to obtain estimates for the unknown parameters of continuoustime SDE models with discrete observations (see, for example, [prewo] for likelihoodbased inference techniques, [Sor] for an estimating function approach). As explained in the next subsection, we have considered approximate likelihood methods, similar in spirit to [Section 11.4]saso.
5.2 Approximate likelihood in the space
Gaussian approximations to the transition densities of nonlinear timeinhomogeneous SDEs are available through different algorithms [Chapter 9]saso. However, as Figure 5 may suggest at first glance, the choice of a Gaussian density could be inadequate when straightly applied to approximate the transition density of the forecast error of the normalized wind power production.
Therefore, we propose to use a surrogate transition density for
other than Gaussian. The moments of the SDE model (
9) are then matched to the surrogate density moments.From (4), we have , for any , and .
For , using Itô’s lemma we derive
(15) 
For any , the first two moments of , and , can be computed by solving the following system
(16) 
with initial conditions and
5.2.1 Moment Matching
A suitable candidate for a surrogate transition density of is a Beta distribution on a compact interval parameterized by two positive shape parameters, . Recall that the choice of the beta proxy distribution is a natural choice as it is the invariant distribution of the Jacobi type processes.
For any , we approximate the transition densities of the process using a Beta distribution. We equal the first two central moments of with the corresponding moments of the Beta surrogate distribution on with shape parameters .
The shape parameters are given by
(17) 
where and
5.3 Approximate likelihood in the space
The transition density of the process , which has been defined through the Lamperti transformation (10) of , can be conveniently approximated by a Gaussian surrogate density.
The drift coefficient of the process that satisfies (12) is nonlinear. After linearizing the drift around the mean of , , we obtain the following system of ODEs to compute, for any , the approximations of the first two central moments of , say and :
(19) 
with initial conditions and and where
The approximate Lamperti loglikelihood for the observed sample is given by
(20) 
where the limits and are computed solving numerically the initialvalue problem (19).
5.4 Algorithm for the approximate maximum likelihood estimations
In this subsection, we aim to infer the model’s parameters using optimization techniques. We start by finding an initial guess close enough to the optimal value, and from that point, start the optimization.
5.4.1 Initial guess
To guarantee the good behave for our optimization algorithm, we aim to start the optimization as close as we can from the optimal parameters. We use least square minimization and quadratic variation over the data to find an initial guess .

Least square minimization: We consider the observed data with length between observations , where and . For any
, the random variable
has a conditional mean that can be approximated by the solution of the systemin the limit , i.e., . Then, the random variable has zero mean. If we assume that for all , then . If we have a total of transitions, we can write the regression problem for the conditional mean with loss function as
(21) As Equation (21) is convex in , it is enough to verify the first order optimality conditions. From
it follows that
(22) We approximate by Equation (22) setting .

Quadratic variation: We approximate the quadratic variation of the Itô’s process
with the discrete sum .
As initial guess for the diffusion variability coefficient , we choose
(23) where is the length of the time interval between two consecutive measurements.
5.4.2 Negative loglikelihood minimization in the space
To find the optimal parameters, we minimize the negative loglikelihood (negative version of (18)) using the derivativefree function fminsearch from MATLAB R2019b over the parameters . At each step of the iteration, we:

Use the training dataset to find the SDE’s first and second moments as explained in Subsection 5.2.

Match the proxy distribution moments with the SDE’s moments.

Evaluate the negative loglikelihood using the training dataset.
5.4.3 Negative loglikelihood minimization in the space
Let be the observed data, and the Lamperti transform of the observation . As we can see in Section 4, the transformed observations depend on the vector .
The problem of maximizing the approximated Lamperti loglikelihood (20), i.e.,
is not totally defined as the data depend on . However, we propose to find a fixed point such that
(24) 
At a fixed point, the likelihood has a maximum for the data set corresponding to that point. We are interested in finding solutions for (24). If solves (24), then
Given , we call to the solution of
For each , we define the relative error function as
(25) 
and remark that is a fixed point if and only if . To find minimizers for , we proceed in the same way as described in Subsection 5.4.2, but using Gaussian surrogate densities.
5.5 Model specification with the additional parameter
We observe that for most days, the forecast error at time is not zero. According to the forecasts procedure, we may assume that there is a time in the past , such that the forecast error .
For any , we extrapolate backward linearly the truncated prediction function to get its value at time , , and set . We assume that the initial transition has a Beta distribution and apply to it the same moment matching method used above. Given a vector of parameters , we estimate solving the following problem
(26) 
where is the approximated likelihood. To solve this problem, we repeat the steps described in Subsection 5.4.2, with the additional initial step of creating the linear extrapolation for at each .
As remarked, we extend the statistical model (13) to include the extra parameter . The approximated complete likelihood , which estimates the vector , is given by
(27) 
where is the nonlog version of (13). As we can provide initial guesses for and , we have a starting point for the numerical optimization of the approximated complete likelihood (27).
6 Application to the AprilDecember 2019 Uruguay wind and forecast dataset
Our statistical analysis starts with partitioning the 147 segments of normalized wind power production, each 24hours long. We select 73 noncontiguous segments for the models’ calibration procedure, assigning them to the training set. The other 74 noncontiguous segments compose the test set. Such an allocation mechanism guarantees independence among the segments, matching the assumption we did in Section 5 to formulate the statistical models. Additional crosscorrelation tests were performed to ensure this assumption.
All the following results involving a single provider refer to provider A. Furthermore, all calibrations involve the training sets and all simulations, the test sets. Following the instruction for the initial guesses from Subsection 5.4 and assuming that
we obtain the initial guess .
6.1 Calibration of the approximate negative loglikelihood in space and space
As an auxiliary verification, we plot the negative loglikelihood (negative version of (18)) as a function of the parameters, and we use additional minimization functions from MATLAB R2019b. Moreover, we realized an additional inference utilizing the test sets to guarantee the robustness of our numerical methods.
On Figure (8), we can see the level sets for the negative loglikelihood for both training and test sets. The numerical values of each relevant point can be seen in Table (1). We set the optimal parameters in the space , as it is where the negative loglikelihood for the training sets reaches its minimum value.
Training sets  Test sets  

Initial guess  1.54  0.072  0.111  1.96  0.053  0.104 
fminsearch  1.14  0.076  0.097  1.64  0.054  0.089 
fmincon  1.58  0.062  0.097  1.63  0.055  0.089 
fminunc  1.54  0.063  0.097  1.96  0.045  0.089 
Evaluations  1.93  0.050  0.097  1.59  0.056  0.089 
We observe that all the local (possibly global) minimizers are located over the curves and for the training and test sets, respectively. This effect shows that the optimization is more sensitive to the diffusion than to the drift.
In the space, we obtain the optimal parameters as minimizers for the relative error function (25) using fminsearch from MATLAB R2019b and the training sets.
To verify and compare these two vector of parameters, (i.e., and ), we simulate error paths in the space. We simulate five error paths for each day in the test set and construct histograms with the transitions. The histograms can be seen in Figure (9). We observe a slightly better approximation using .
6.2 Model comparison and assessment of the forecast providers
We compare two candidate models to find the bestfit that maximizes the retained information, the Model 1, introduced in ([elkate], p.383), and our proposed model (7), from hereafter called Model 2.

Model 1: This model does not feature derivative tracking:
(28) with .

Model 2: This model features derivative tracking and timevarying meanreversion parameter:
Comments
There are no comments yet.