# A Derivative Tracking Model for Wind Power Forecast Error

Reliable wind power generation forecasting is crucial for applications such as the allocation of energy reserves, optimization for electricity price, and operation scheduling of conventional power plants. We propose a data-driven model based on parametric Stochastic Differential Equations (SDEs) to capture the real asymmetric dynamics of wind power forecast errors. Our SDE framework features time-derivative tracking of the forecast, time-varying mean-reversion parameter, and an improved state-dependent diffusion term. The methodology we developed allows the simulation of future wind power production paths and to obtain sharp empirical confidence bands. All the procedures are agnostic of the forecasting technology, and they enable comparisons between different forecast providers. We apply the model to historical Uruguayan wind power production data and forecasts between April and December 2019.

## Authors

• 1 publication
• 4 publications
• 4 publications
• 30 publications
08/22/2021

### Wind Power Projection using Weather Forecasts by Novel Deep Neural Networks

The transition from conventional methods of energy production to renewab...
10/09/2020

### Physics-Informed Gaussian Process Regression for Probabilistic States Estimation and Forecasting in Power Grids

Real-time state estimation and forecasting is critical for efficient ope...
07/17/2019

### Feature-driven Improvement of Renewable Energy Forecasting and Trading

Inspired from recent insights into the common ground of machine learning...
05/31/2019

### Influences in Forecast Errors for Wind and Photovoltaic Power: A Study on Machine Learning Models

Despite the increasing importance of forecasts of renewable energy, curr...
11/03/2017

### Cost-Optimal Operation of Energy Storage Units: Impact of Uncertainties and Robust Estimator

The rapid expansion of wind and solar energy leads to an increasing vola...
09/04/2019

### Mape_Maker: A Scenario Creator

We describe algorithms for creating probabilistic scenarios for the situ...
07/31/2018

### Deep Belief Networks Based Feature Generation and Regression for Predicting Wind Power

Wind energy forecasting helps to manage power production, and hence, red...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Despite the Covid-19 pandemic causing “the biggest shock to the global energy system in more than seven decades” the recent report released on April 30, 2020, by the International Energy Agency (https://www.iea.org/reports/global-energy-review-2020) stresses that projections of energy demand will fall 6% in 2020 – seven times the decline after the 2008 global financial crisis – It is emphasized that “renewables are set to be the only energy source that will grow in 2020, with their share of global electricity generation projected to jump thanks to their priority access to grids and low operating cost” ([IEA]).

Although the growth of renewables in electricity generation in 2020 is set to be lower than in previous years, solar photo-voltaic (PV) and wind are on track to help lift renewable electricity generation by 5% in 2020, aided by higher output from hydropower.

Wind and solar energy are expanding renewable generation capacity, experiencing record growth in the last years.

Reliable wind power generation forecasting is crucial for the following applications (see, for example, [gieb, 5], [chang, 162], [zhbo]):

• Allocation of energy reserves such as water levels in dams or oil, and gas reserves.

• Operation scheduling of controllable power plants.

• Optimization of the price of electricity for different parties such as electric utilities, Transmission system operator (TSOs), Electricity service providers (ESPs), Independent power producers (IPPs), and energy traders.

• Maintenance planning such as that of power plants components and transmission lines.

Different methods have been applied to wind power forecasting. They can be generally categorized as follows: physical models, statistical methods, artificial intelligence methods, and hybrid approaches. The output of such methods is usually a deterministic forecast. Occasionally probabilistic forecasts are produced through uncertainty propagation in the data, parameters, or forecast ensembles. However, there is a lack of simulating and producing data-driven stochastic forecasts based on forecasting models. It is crucial to capture the forecast’s actual performance as it has been known that different forecasting technologies exhibit different behavior for different wind farms and seasons (

[chang]). This is due to many factors that forecast are challenged to capture, such as the surrounding terrains of the wind farm and the condition of the blades such as icing, wear and tear, or dirt. It is known that complex terrains in both off shore and on shore locations decrease the accuracy of wind power forecasts significantly ([schicker2017short]). It also has been shown that the performance of forecasts varies from month to month. Thus the performance of wind power forecasts is location and time dependent.

Many approaches have been taken to evaluate the uncertainty of a given forecast. There are two types of errors: level errors and phase errors. The use of mean or median errors in this context may be misleading as wind power forecast errors are asymmetric. This is a natural consequence of wind power being non-negative and bounded by the maximum capacity of production. This is important as the associated cost to power forecast errors is also asymmetric due to different costs for up and down power regulations, which are determined by the electricity market ([tsitsiklis2015pricing]).

We propose to model wind power forecast errors using parametric stochastic differential equations (SDEs) whose solution defines a stochastic process. This resultant stochastic process describes the time evolution dynamics of wind power forecast errors while capturing properties such as a correlation structure and the inherent asymmetry. Additionally, the model we propose is agnostic of the forecasting technology and serves to complement forecasting procedures by providing a data-driven stochastic forecast. Hence, we can evaluate wind power forecasts according to their real-world performance, and we can compare different forecasting technologies. Most notably, we can simulate future wind power production given a deterministic wind power forecast. Future wind power production using Monte Carlo methods, as well as the analytic form of the proposed SDE, can be used in optimal control problems involving wind power production.

Some interesting works have been devoted to probabilistic forecasting related to renewable energies based on stochastic differential equations, among them ([mozuma]) on wind power forecast, ([immm]) and ([bggk]) on forecasts of solar irradiance. Here, we propose an improved model featuring time derivative tracking of the forecast, time-dependent mean reversion, modified diffusion, and non-Gaussian approximations. We apply the model to Uruguayan wind power forecasts together with historical wind power production data pertaining to the year 2019.

The rest of the paper is organized as follows. In Section 2, we introduce the main characteristics of a real data set encompassing the normalized wind power production in Uruguay between April and December 2019, with the most accurate predictions, as highlighted in our posterior analysis, performed by one out of the three sources of forecast providers. The significant steps for constructing phenomenological models of the normalized wind power production and the forecast error based on stochastic differential equations are described in Section 3. The application of the Lamperti transform with unknown parameters in Section 4 leads to model the forecast error through a stochastic differential equation with a unit diffusion coefficient. In Section 5, we write down the expressions for the likelihood functions of the forecast error in its original space and the Lamperti space. We also derive simple approximations of the likelihood functions. Section 5

concludes with the description of the optimization algorithm to compute approximate maximum likelihood estimates, including the case where we expand the model comprising an initial transition from the time the forecast is performed to the time of the first forecast.

In Section 6, we apply our proposed numerical estimation procedures to the Uruguay wind and forecast dataset, comparing two alternative models and assessing, for the best candidate model, the performance of three different forecast providers. Section 7 concludes the paper. The proofs of the existence, strong uniqueness, and boundedness of the SDE solutions used to model normalized wind power production and its forecast error are given in the Appendix.

## 2 Wind power production data in Uruguay and forecast providers

In recent years, Uruguay has triggered a remarkable change in its energy matrix. In ([irena], p.23), Uruguay was among those countries showcasing innovation, like Denmark, Ireland, Germany, Portugal, and Spain, with proven feasibility of managing annual variable renewable energy (VRE) higher than 25% in power systems.

According to ([ren21], pp.118–119), in 2018, Uruguay achieved 36% of its electricity production from variable wind energy and solar PV, raising the share of generation from wind energy more than five-fold in just four years, from 6.2% in 2014 to 33% in 2018.

At present, Uruguay is fostering even higher levels of wind penetration by boosting regional power trading with Argentina and Brazil. In this rapidly evolving scenario, it is essential to analyze national data on wind power production with wind power short-term forecastings to orientate and assess the strategies and decisions of wind energy actors and businesses.

Our study is based on publicly available data (source: Administrator of Electric Market) on the wind power production in Uruguay between April and December 2019, that we adequately normalized with respect to the present maximum installed wind power capacity. Each day, wind power production recordings are available every ten minutes. In this work, we also considered data from three different forecast providers, available each day starting at 1 pm.

The next Figure 2 shows the wind power real production during four segments 24 hours selected from the observation period together with their corresponding hourly short-term forecast, computed by a forecast provider. For the sake of visualization clarity, this section relies only on forecasts from one provider, called “provider A” from now on, ranked as the most accurate forecast provider, as it emerged from our posterior analysis.

A view of the global discrepancy between the real production and the forecasted production, during the nine months observation period, is summarized through the forecast error histograms in Figure 3, where we also partitioned the forecast errors according to three contiguous categories of normalized generated power. Low normalized generated power corresponds to the range , mid-power refers to the range , and high-power to the range .

We may observe that all the histograms in Figure 3

exhibit skewed patterns, to a different extent, as well as extreme observations. The presence of these features can be partly explained. The data analysis highlighted that, during several 24-hour segments, the system operators decided to reduce or even cease the wind power production. Indeed, as recalled in (

[irena2], p.8), “Uruguay experiences high curtailment levels because generation exceeds demand.” Despite the large country’s interconnection capacity with Argentina and Brazil, there is no active cross-border market; the energy is traded via ad hoc short-term agreements. ([irena2], p.3) “Even with interconnection capacity exceeding peak demand, the power system experiences high VRE curtailment, mostly at night when wind generation exceeds demand.”

The curtailment of the wind power production imposed by the system operators has a strong influence on the forecast error. To build a model that, driven by the available forecast, allows the inclusion of true power production with a prescribed degree of uncertainty, it is necessary to remove the data segments affected by wind curtailment.

Once we removed all the 24-hour segments showing wind curtailment, we set up a dataset containing 147 daily segments. In the absence of the curtailment intervention, the forecast error histograms shown below in Figure 4, can appreciate skewness reduction, except for low power forecast error histogram.

In this stage of data preprocessing, we obtain another useful result by applying the first-order difference operator to the forecast errors. The forecast error transition histograms, displayed in the next Figure 5, will later constitute a reference for the visual assessment of the global fit of the proposed models.

The histograms in Figure 5 feature a non-Gaussianity trait and provide initial input for the model-building stage, which is described in the next section. Later, in Section 4, guided from inferring the unknown model parameters, we will also propose transforming data as a strategy that naturally leads to an alternative model.

## 3 Phenomenological Model

After analyzing the available dataset, we are now in the condition to build a type of phenomenological model for the normalized wind power generation forecasts that, in its most general form, is a stochastic process defined by the following stochastic differential equation (SDE):

 {\difXt=a(Xt;pt,˙pt,θ)\dift+b(Xt;pt,˙pt,θ)\difWt,t∈[0,T]X0=x0∈[0,1], (1)

where

• denotes a drift function,

• a diffusion function,

• is a vector of unknown parameters,

• is a time-dependent deterministic function -valued and is its time derivative,

• is a standard real-valued Wiener process.

In this work, is to be considered a deterministic forecast for the normalized wind power generation, which is provided by Administrator of Electric Market.

Our goal is to achieve a specification of the model (1) to follow the available normalized wind power forecasts closely while ensuring its unbiasedness with respect to the forecast.

### 3.1 Physical Constraints

Let be the available prediction function for the normalized wind power, which is an input to this approach. To start with the model specification, first, we introduce a time-dependent drift function that features the mean-reverting property as well as derivative tracking:

 a(Xt;pt,˙pt,θ)=˙pt−θt(Xt−pt), (2)

where is a positive deterministic function, whose range depends on , as will be explained shortly.

Now, looking at the normalized wind power generation forecast process , modeled as solution to the Itô stochastic differential equation (1) with the drift specified in (2), it is straightforward to check that , given . The application of Itô’s lemma leads to

 e∫t0θs\difsXt−X0=∫t0(˙ps+θsps)e∫s0θu\difu\difs+∫t0b(Xs;ps,˙ps,θ)e∫s0θu\difu\difWs. (3)

Taking expectation on (3) we obtain

 E[Xt]=e−∫t0θs\difs(E[X0]+pte∫t0θs\difs−p0)=pt. (4)

At this stage, the process defined by (1) with drift (2) satisfies the two following properties:

• it reverts to its mean , with a time-varying speed that is proportional to the deviation of the process from its mean,

• it tracks the time derivative .

Remark: Observe that a mean-reverting model without derivative tracking shows a delayed path behavior. For instance, consider the diffusion model (1) with . In this case, given , the diffusion has mean . The next Figure (6) illustrates how different behave the estimated confidence bands for two diffusion models with and without derivative tracking, fitting the same daily segment.

The forecast and production wind power data of Uruguay are normalized with respect to the installed power capacity during the period of observation. Thus, the mean-reverting level lies in , and the process must take values in the same interval, a requirement that is not automatically fulfilled through the derivative tracking. To impose that the state space of is , we may choose a convenient diffusion term, and require that the time-varying parameter satisfies an ad-hoc condition.

Let , and choose a state-dependent diffusion term that avoids the process exiting from the range as follows:

 b(Xt;θ)=√2αθ0Xt(1−Xt) (5)

where is an unknown parameter that controls the path variability. This diffusion term belongs to the Pearson diffusion family and, in particular, it defines a Jacobi type diffusion. It is useful to recall that ([foso, 440]) a Pearson diffusion is a stationary solution to a stochastic differential equation of the form

 \difXt=−θ(Xt−μ)\dift+√2θ(aX2t+bXt+c)\difWt (6)

where , and , , and are parameters such that the square root is well defined when is in the state space. These parameters, together with , the mean of the invariant distribution, determine the state space of the diffusion as well as the shape of the invariant distribution.

An exhaustive classification of the (stationary) Pearson diffusions is presented in ([foso, 440-443]) where, in particular, it is discussed the case and

, where the invariant distribution is a Beta distribution with parameters

that leads to the well-known Jacobi diffusions, so-called because the eigenfunctions of the infinitesimal generator of these processes are the Jacobi polynomials (see, for example,

[leph, 2860-2861]).

It is worth mentioning that Jacobi diffusions have been successfully applied in several disciplines, among them finance (see [vago] and references therein) and neuroscience ([dotala]).

However, a distinctive feature in our proposed model

 {\difXt=(˙pt−θt(Xt−pt))\dift+√2αθ0Xt(1−Xt)\difWt,t∈[0,T]X0=x0∈[0,1], (7)

is that the drift term contains the time-varying parameter , rendering the solution of (7) to a non-stationary and time-inhomogeneous process. To ensure that the process is the unique strong solution of (7) for all with state space a.s., the mean-reversion time-varying parameter must satisfy the condition:

 θt≥max(αθ0+˙pt1−pt,αθ0−˙ptpt). (B)

The proof of this theoretical statement is presented in Section 8.

Remark: Condition (B) shows that the time-varying parameter becomes unbounded when or . Therefore, we consider the following truncated prediction function

that satisfies for any and , providing that is bounded for every .

Remark: In this work, the analysis of the three different forecast datasets shows that there exists, for any forecast provider, a small to define the truncated prediction function fulfilling the above condition.

From now on, we will keep the notation to denote the truncated prediction function (8), unless specified otherwise.

### 3.2 A model specification for the forecast error

After applying to (7) the simple change of variables

 Vt=Xt−pt,

we may introduce the following model for the forecast error of the normalized wind power production:

 {\difVt=−θtVt\dift+√2αθ0(Vt+pt)(1−Vt−pt)\difWt,t∈[0,T]V0=v0∈[−1+p0,1−p0]. (9)

## 4 State independent diffusion term: Lamperti transform

Our model (9) for the forecast error has a diffusion term that depends on the state variable . Under the conditions that permit the use of Itô’s formula on a well-chosen transformation of the process , John Lamperti ([lamp]) first showed that the transformed process is again a diffusion process that is solution to a SDE with unit coefficient for the diffusion term. The vast literature nowadays refers to this result as the so-called Lamperti transform (see, for example, [iacus1, 40–41]; [moma]; [pani, 199–203]; [saso, 98–100]), which is a basic tool to obtain a SDE for the transformed process whose diffusion term does not depend anymore on the state variable. A remarkable effect of removing the state dependency from the random noise term is to increase the numerical stability of the simulated paths of the transformed process. For this reason, some estimation methods of the unknown parameters of non-linear SDE models incorporated the Lamperti’s change of variable as part of a more complex approximation procedure (for example, in the case of one-dimensional diffusions, the local linearization method in [shoz], or the expansion method in [ait], later extended to time-inhomogeneous SDEs in [eglix]).

We consider the following Lamperti transform with unknown parameters

 Zt=h(Vt,t;θ) =1√2αθ0∫1√(v+pt)(1−v−pt)\difv∣∣∣v=Vt=−√2αθ0arcsin(√1−Vt−pt) (10)

that, after applying Itô’s formula on , leads to the following SDE with state independent unit diffusion term

 \difZt=[ ˙pt√2αθ0(Vt+pt)(1−Vt−pt)+−θtVt√2αθ0(Vt+pt)(1−Vt−pt)−14√2αθ0(1−2(Vt+pt))√(Vt+pt)(1−Vt−pt)]\dift+\difWt. (11)

After replacing in (11), we obtain that the process satisfies the SDE

 \difZt =⎡⎢ ⎢ ⎢ ⎢⎣˙pt−θt(1−pt−sin2(−√αθ02Zt))√2αθ0cos(−√αθ02Zt)sin(−√αθ02Zt)−14√2αθ0(1−2cos2(−√αθ02Zt))cos(−√αθ02Zt)sin(−√αθ02Zt)⎤⎥ ⎥ ⎥ ⎥⎦\dift+\difWt =[2˙pt−θt(1−2pt)+(αθ0−θt)cos(−√2αθ0Zt)√2αθ0sin(−√2αθ0Zt)]\dift+\difWt. (12)

To summarize the effect of the Lamperti transform visually, we can see in the next Figure (7) how the forecast error transition histograms (without curtailment) modify in comparison with Figure (5).

In this case, the Lamperti transform has been applied using the optimal estimates of the parameters in the SDE model (9), obtained applying our numerical procedure detailed later in Subsection 5.4.

The shape of the forecast error transition histograms after applying the Lamperti transform has similarities with the Gaussian distribution, motivating toward the use of Gaussian-like approximations of the unknown density transition functions of the process

.

Remark: Moreover, this obtained Gaussian distribution supports the validity of the choice of our model diffusion coefficient given by (6).

## 5 Likelihood functions of the forecast error data and optimization algorithm

### 5.1 Likelihood in the V−space

Suppose that any of non-overlapping paths of the continuous-time Itô process , each one starting at a different time with , is sampled at equispaced discrete points with given length interval . Let denote this random sample, with .

Let

be the conditional probability density of

given evaluated at , where are the unknown model parameters.

The Itô process defined by the SDE (9) is Markovian, and the likelihood function of the sample can be written as the following product of transition densities:

 L(θ;VM,N+1)=M∏j=1{N∏i=1ρ(Vj,i|Vj,i−1;p[tj,i−1,tj,i],θ)}, (13)

where for any and .

Remark: In the last subsection of this section, we will extend the statistical model (13) by adding the transition that occurs during the time interval, say of length

, between the epoch when the forecast is done and the first epoch (1 pm) of each day-ahead forecast. To this purpose, the likelihood function (

13) must include for any of the paths an additional factor, say , expressing the conditional density of the early transition. The parameter can be calibrated together or after the estimation of , suggesting an optimal time for the scheduling of the forecasts.

The exact computation of the likelihood (13) relies on the availability of a closed-form expression for the transition densities of that, on the basis of the Markovian property of , are characterized for , as solutions of the Fokker-Planck-Kolmogorov equation ([iacus1, 36]; [saso, 61-68]):

 ∂f∂t ρ(v,t|vj,i−1,tj,i−1;θ)=−∂∂v(−θtvρ(v,t|vj,i−1,tj,i−1;θ)) +12∂2∂v2(2θ0α(v+pt)(1−v−pt)ρ(v,t|vj,i−1,tj,i−1;θ)), (14)

subject to the initial conditions where is the Dirac-delta generalized function centered at

However, closed-form solutions to initial-boundary value problems for time-inhomogeneous diffusions can be obtained only in a few cases (see, for example, [eglix, Section 3.1]). In our case, solving numerically (14) for the transition densities of the process at every transition step is computationally expensive. Several numerical techniques have been devised to obtain estimates for the unknown parameters of continuous-time SDE models with discrete observations (see, for example, [prewo] for likelihood-based inference techniques, [Sor] for an estimating function approach). As explained in the next subsection, we have considered approximate likelihood methods, similar in spirit to [Section 11.4]saso.

### 5.2 Approximate likelihood in the V−space

Gaussian approximations to the transition densities of nonlinear time-inhomogeneous SDEs are available through different algorithms [Chapter 9]saso. However, as Figure 5 may suggest at first glance, the choice of a Gaussian density could be inadequate when straightly applied to approximate the transition density of the forecast error of the normalized wind power production.

Therefore, we propose to use a surrogate transition density for

other than Gaussian. The moments of the SDE model (

9) are then matched to the surrogate density moments.

From (4), we have , for any , and .

For , using Itô’s lemma we derive

 \difE[Vmt]\dift=−mθtE[Vmt]+m(m−1)αθ0E[−Vmt+(1−2pt)Vm−1t+pt(1−pt)Vm−2t]. (15)

For any , the first two moments of , and , can be computed by solving the following system

 ⎧⎪⎨⎪⎩\difm1(t)\dift=−m1(t)θt\difm2(t)\dift=−2(θt+αθ0)m2(t)+2αθ0(1−2pt)m1(t)+2αθ0pt(1−pt) (16)

with initial conditions and

#### 5.2.1 Moment Matching

A suitable candidate for a surrogate transition density of is a Beta distribution on a compact interval parameterized by two positive shape parameters, . Recall that the choice of the beta proxy distribution is a natural choice as it is the invariant distribution of the Jacobi type processes.

For any , we approximate the transition densities of the process using a Beta distribution. We equal the first two central moments of with the corresponding moments of the Beta surrogate distribution on with shape parameters .

The shape parameters are given by

 ξ1(t)=−(μt+1−ϵ)(μ2t+σ2t−(1−ϵ)2)2(1−ϵ)σ2t,ξ2(t)=(μt−1+ϵ)(μ2t+σ2t−(1−ϵ)2)2(1−ϵ)σ2t, (17)

where and

The approximate log-likelihood of the observed sample can be expressed as

 ~ℓ(θ;vM,N+1)=M∑j=1N∑i=1log⎧⎪⎨⎪⎩12(1−ϵ)1B(ξ1(t−j,i),ξ2(t−j,i))(vj,i+1−ϵ2(1−ϵ))ξ1(t−j,i)−1(1−ϵ−vj,i2(1−ϵ))ξ2(t−j,i)−1⎫⎪⎬⎪⎭, (18)

where the shape parameters and , according to (17), depend on the limit quantities and as that are computed solving numerically the initial-value problem (16). denotes the Beta distribution with parameters and .

### 5.3 Approximate likelihood in the Z−space

The transition density of the process , which has been defined through the Lamperti transformation (10) of , can be conveniently approximated by a Gaussian surrogate density.

The drift coefficient of the process that satisfies (12) is nonlinear. After linearizing the drift around the mean of , , we obtain the following system of ODEs to compute, for any , the approximations of the first two central moments of , say and :

 ⎧⎪⎨⎪⎩\dif~μZ(t)\dift=a(~μZ(t);pt,˙pt,θ)\dif~vZ(t)\dift=2a′(~μZ(t);pt,˙pt,θ)~vZ(t)+1 (19)

with initial conditions and and where

 a′(~μZ(t);pt,˙pt,θ)=(αθ0−θt)−cos(√2αθ0Zt)[θt(1−2pt)−2˙pt]sin2(√2αθ0Zt).

The approximate Lamperti log-likelihood for the observed sample is given by

 ~ℓZ(θ;zM,N+1)=M∑j=1N∑i=1log⎧⎪ ⎪⎨⎪ ⎪⎩1√2π~vZ(t−j,i;θ)exp(−(zj,i−~μZ(t−j,i;θ))22~vZ(t−j,i;θ))⎫⎪ ⎪⎬⎪ ⎪⎭, (20)

where the limits and are computed solving numerically the initial-value problem (19).

### 5.4 Algorithm for the approximate maximum likelihood estimations

In this subsection, we aim to infer the model’s parameters using optimization techniques. We start by finding an initial guess close enough to the optimal value, and from that point, start the optimization.

#### 5.4.1 Initial guess

To guarantee the good behave for our optimization algorithm, we aim to start the optimization as close as we can from the optimal parameters. We use least square minimization and quadratic variation over the data to find an initial guess .

• Least square minimization: We consider the observed data with length between observations , where and . For any

, the random variable

has a conditional mean that can be approximated by the solution of the system

 {\difE[V](t)=−θtE[V](t)\diftE[V](tj,i−1)=vj,i−1,

in the limit , i.e., . Then, the random variable has zero mean. If we assume that for all , then . If we have a total of transitions, we can write the regression problem for the conditional mean with loss function as

 ^c=argminc ≥ 0[M∑j=1N∑i=1(vj,i−E[V](t−j,i))2]=argminc ≥ 0[M∑j=1N∑i=1(vj,i−vj,i−1e−cΔ)2]≈argminc ≥ 0[M∑j=1N∑i=1(vj,i−vj,i−1(1−cΔ))2]. (21)

As Equation (21) is convex in , it is enough to verify the first order optimality conditions. From

 ∂∂c[M∑j=1N∑i=1(vj,i−vj,i−1(1−cΔ))2]=M∑j=1N∑i=12vj,i−1Δ(vj,i−vj,i−1(1−cΔ)),

it follows that

 ^c≈∑Mj=1∑Ni=1vj,i−1(vj,i−1−vj,i)Δ⋅∑Mj=1∑Ni=1(vj,i−1)2. (22)

We approximate by Equation (22) setting .

• Quadratic variation: We approximate the quadratic variation of the Itô’s process

 ⟨V⟩t=∫t0b(Vs;θ,ps)2\difs, % where b(Vs;θ,ps)=√2αθ0(Vs+ps)(1−Vs−ps),

with the discrete sum .

As initial guess for the diffusion variability coefficient , we choose

 θ∗0α∗=∑Mj=1∑Ni=1(vj,i−vj,i−1)22Δ⋅∑Mj=1∑Ni=1(vj,i+pj,i)(1−vj,i−pj,i), (23)

where is the length of the time interval between two consecutive measurements.

#### 5.4.2 Negative log-likelihood minimization in the V−space

To find the optimal parameters, we minimize the negative log-likelihood (negative version of (18)) using the derivative-free function fminsearch from MATLAB R2019b over the parameters . At each step of the iteration, we:

• Use the training dataset to find the SDE’s first and second moments as explained in Subsection 5.2.

• Match the proxy distribution moments with the SDE’s moments.

• Evaluate the negative log-likelihood using the training dataset.

#### 5.4.3 Negative log-likelihood minimization in the Z−space

Let be the observed data, and the Lamperti transform of the observation . As we can see in Section 4, the transformed observations depend on the vector .
The problem of maximizing the approximated Lamperti log-likelihood (20), i.e.,

 maxθ~ℓZ(θ;zM,N+1),

is not totally defined as the data depend on . However, we propose to find a fixed point such that

 θ⋆=argmaxθ~ℓZ(θ;{h(vj,i,tj,i;θ⋆)}M,Nj=1,i=0). (24)

At a fixed point, the likelihood has a maximum for the data set corresponding to that point. We are interested in finding solutions for (24). If solves (24), then

 θ⋆−argmaxθ~ℓZ(θ;{h(vj,i,tj,i;θ⋆)}M,Nj=1,i=0)=0.

Given , we call to the solution of

 argmaxθ~ℓZ(θ;{h(vj,i,tj,i;θ⋆)}M,Nj=1,i=0).

For each , we define the relative error function as

 Ψ(θ⋆)=|θ⋆0−θ⋆⋆0||θ⋆0|+|α⋆−α⋆⋆||α⋆|, (25)

and remark that is a fixed point if and only if . To find minimizers for , we proceed in the same way as described in Subsection 5.4.2, but using Gaussian surrogate densities.

### 5.5 Model specification with the additional parameter δ

We observe that for most days, the forecast error at time is not zero. According to the forecasts procedure, we may assume that there is a time in the past , such that the forecast error .

For any , we extrapolate backward linearly the truncated prediction function to get its value at time , , and set . We assume that the initial transition has a Beta distribution and apply to it the same moment matching method used above. Given a vector of parameters , we estimate solving the following problem

 argmaxδ~Lδ(θ,δ;vM,1)=argmaxδM∏j=1ρ0(vj,0|vj,−δ;θ,δ), (26)

where is the approximated likelihood. To solve this problem, we repeat the steps described in Subsection 5.4.2, with the additional initial step of creating the linear extrapolation for at each .

As remarked, we extend the statistical model (13) to include the extra parameter . The approximated complete likelihood , which estimates the vector , is given by

 ~Lc(θ,δ;vM,N+1)=~L(θ;vM,N+1)~Lδ(θ,δ;vM,1), (27)

where is the non-log version of (13). As we can provide initial guesses for and , we have a starting point for the numerical optimization of the approximated complete likelihood (27).

## 6 Application to the April-December 2019 Uruguay wind and forecast dataset

Our statistical analysis starts with partitioning the 147 segments of normalized wind power production, each 24-hours long. We select 73 non-contiguous segments for the models’ calibration procedure, assigning them to the training set. The other 74 non-contiguous segments compose the test set. Such an allocation mechanism guarantees independence among the segments, matching the assumption we did in Section 5 to formulate the statistical models. Additional cross-correlation tests were performed to ensure this assumption.

All the following results involving a single provider refer to provider A. Furthermore, all calibrations involve the training sets and all simulations, the test sets. Following the instruction for the initial guesses from Subsection 5.4 and assuming that

 θt=max(θ0,αθ0+|˙pt|min(pt,1−pt)),

we obtain the initial guess .

### 6.1 Calibration of the approximate negative log-likelihood in V-space and Z-space

As an auxiliary verification, we plot the negative log-likelihood (negative version of (18)) as a function of the parameters, and we use additional minimization functions from MATLAB R2019b. Moreover, we realized an additional inference utilizing the test sets to guarantee the robustness of our numerical methods.

On Figure (8), we can see the level sets for the negative log-likelihood for both training and test sets. The numerical values of each relevant point can be seen in Table (1). We set the optimal parameters in the -space , as it is where the negative log-likelihood for the training sets reaches its minimum value.

We observe that all the local (possibly global) minimizers are located over the curves and for the training and test sets, respectively. This effect shows that the optimization is more sensitive to the diffusion than to the drift.

In the -space, we obtain the optimal parameters as minimizers for the relative error function (25) using fminsearch from MATLAB R2019b and the training sets.

To verify and compare these two vector of parameters, (i.e., and ), we simulate error paths in the space. We simulate five error paths for each day in the test set and construct histograms with the transitions. The histograms can be seen in Figure (9). We observe a slightly better approximation using .

### 6.2 Model comparison and assessment of the forecast providers

We compare two candidate models to find the best-fit that maximizes the retained information, the Model 1, introduced in ([elkate], p.383), and our proposed model (7), from hereafter called Model 2.

• Model 1: This model does not feature derivative tracking:

 {\difXt=−θ0(Xt−pt)\dift+√2αθ0Xt(1−Xt)\difWt,t∈[0,T]X0=x0∈[0,1],. (28)

with .

• Model 2: This model features derivative tracking and time-varying mean-reversion parameter:

 {\difXt=(˙pt−θt(Xt