Mape_Maker: A Scenario Creator

09/04/2019 ∙ by Guillaume Goujard, et al. ∙ 0

We describe algorithms for creating probabilistic scenarios for the situation when the underlying forecast methodology is modeled as being more (or less) accurate than it has been historically. Such scenarios can be used in studies that extend into the future and may need to consider the possibility that forecast technology will improve. Our approach can also be used to generate alternative realizations of renewable energy production that are consistent with historical forecast accuracy, in effect serving as a method for creating families of realistic alternatives – which are often critical in simulation-based analysis methodologies

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Uncertainty associated with the forecasted output of renewable energy sources such as wind and solar mandates analysis and management techniques that take stochastics into account. A growing literature describes methods for creating and evaluating probabilistic scenarios

, which are forecasts of renewables power generation with an attached probability. A representative sample of this literature can be found in

[2, 3, 4, 5, 6, 8]. Here, we are interested in creating probabilistic scenarios for the situation when the underlying forecast methodology is modeled as being more (or less) accurate than it has been historically. Such scenarios can be used in studies that extend into the future and may need to consider the possibility that forecast technology will improve. Our approach can also be used to generate alternative realizations of renewable energy production that are consistent with historical forecast accuracy, in effect serving as a method for creating families of realistic alternatives – which are often critical in simulation-based analysis methodologies. A general open-source software implementation of the methods described here – a package called mape_maker – is publicly available at https://github.com/mape-maker/mape-maker.

Given a time series of forecasts (e.g., daily over a year), we create a set of scenarios for renewable power production that, based on a forecast system with a specified accuracy, could reasonably correspond to the forecasts. We often refer to these scenarios as actuals, to distinguish these values from historical forecasts. We can also create a set of forecasts that could reasonably correspond to a given time series of actuals. In other words, the process can be inverted. The correspondence between forecasts and actuals is based on analysis of historic forecast error distributions. Subsequently, the word “reasonably” is replaced with mathematical criteria concerning the error distribution, temporal correlation, and in the case of the forecast, curvature. As a preview of the output of this capability, consider Figure 1. This figure provides a simple example where a set of 5 alternative “actual” scenarios are constructed for a few days in July of 2013 based on wind forecast error data from obtained from the California Independent System Operator (CAISO) in the US for July 2013 through May 2015. The target error – specifically, the mean absolute percentage error or MAPE – is the value that was realized in the forecast error data. Because the scenarios are created for days in the past, we are able so show both the forecast and realized actuals on the same plot as the constructed scenarios.

Figure 1: Illustration of 5 scenarios of wind production in CAISO representing alternative actuals. The forecast and realized actuals are also shown.

1.1 Measures of Forecast Error

Let and denote two time-series. For simplicity, we subsequently refer to these time-series as and . We then define the following functions:

The MAPE (Mean Absolute Percentage Error) is simply the MARE (Mean Absolute Relative Error) given as a percentage. Our software library communicates with users in terms of MAPE, but in our discussions here it is convenient to use MARE and sometimes MAE (Mean Absolute Error) variants.

While MAPE is a very popular way of characterizing forecast accuracy for renewables production, it is well-known to have a number of undesirable properties (see, e.g., [7]). One undesirable properties is that values of zero must be ignored in the calculation. We have organized our methods in such a way as to avoid division by zero. Most of the development here is based on converting the MAPE target to an absolute error conditional on the value of , so it would be relatively straightforward extension to convert our algorithms to use some measure of accuracy other than the MAPE.

1.2 Notation Scheme

We use and to denote paired input data of length . Note that which of these pairs is the forecast and which is the actual depends on the user objective, i.e., what is being simulated. For example, if one desires to obtain alternative actuals from forecasts, then

will be simulated actuals. Recalling the canonical goal of constructing a vector

from input for a specified range of dates, we use to denote the input data upon which the construction is based (Simulation Input Data). It may or may not be the case that is a subset of . For the next few sections, we assume that both input datasets are sorted according to the values, e.g., with equal to the cardinality of such that . We will return to a temporal sorting in Section 3.5 when we consider auto-correlation.

We use bold upper case font to denote random variables. As indicated above, the role of the forecasts and the actuals can be reversed. If we want to compute

, then is the input data for the simulation. We let denote a random vector of errors such that so

We let denote a vector of observed errors. We will focus on the modeling of in the following. The title of the paper and the name of our software library derives from the requirement that simulated values must result in a MAPE close enough to the target MAPE. We formalize this constraint as

where is the target MAPE divided by 100%.

1.3 Plausibility Criteria

A main theme underlying this work that we will use to justify some of our design choices involves what we refer to as plausibility criteria

. For any requested MAPE, the distributions of errors computed should be as close as possible to the original error distributions while satisfying the target MAPE. If a user were to select the estimated MAPE as the requested one, one would naturally expect the distribution of errors drawn from the simulated distributions to be somehow “close” to the estimated distribution. For example, if the system of forecasts is producing a wide range of errors at very low forecasted power output, then even if the forecast technology is improving one would expect it to still produce a relatively wider range of errors at low power regardless of the requested MAPE. We formalize these criteria as follows in Definition 

1.1.

Definition 1.1.

A scenario set is said to be plausible if:

  1. The error distribution for the set is close to the empirical distribution of errors, i.e, its plausibility score is close to 1 (as defined in later in Section 3.4).

  2. The computed auto-correlation coefficients for the set are close the empirical values,

  3. the computed curvature for the set is close to the empirical value, especially when the scenarios are forecasts (because we observe that forecasts typically have lower curvature than actuals.)

2 Modeling the Joint Distribution of

Let us define . Here, denotes a random variable with values in – or, if the production capacity is known by the forecaster, values in . We denote by the density of , and denote by and the marginals of . Then,

We also define the conditional density of given as:

Modeling the conditional distribution of errors is important as these distributions can vary significantly with the value of input data. For example, when the forecasts and the actuals are both low, the errors will be biased because the power cannot be below zero. Symmetrically, close to the maximum capacity, , errors are bounded by the fact that power cannot exceed maximum production capacity.

In this context, we introduce the functional to denote the expected value of the absolute error of the distribution conditioned on , defined as:

We then introduce to denote the mean absolute relative error, defined as:

In Figure 2, we provide an illustrative visualization of the relative error as a function of actuals. We note that because actuals are correlated with forecasts, the figure would be very similar if forecasts were used in instead. The data is for CAISO wind power data, ranging from July 1, 2013 to June 30, 2015. We will use this dataset for illustration throughout the paper, and refer to it informally as the CAISO Wind data set. These data are available in the mape_maker software distribution; the file is wind_total_forecast_actual_070113_063015.csv.

Figure 2:

Empirical joint distribution of

- CAISO Wind Power

2.1 Estimating the Conditional Distribution of ,

Given the notation

, we use the beta distribution on

to model . In addition to the and that we will refer to location parameters, a beta distribution requires two additional parameters – and , i.e., the shape parameters. We chose the beta distribution because it has finite support that helps us avoid power values below zero or above and because the shape parameters provide the flexibility necessary to model different behaviors for each . We then define

with

2.2 Intervals for Conditional Estimation

We now define a rule that will be used to estimate the parameters of the conditional density based on each of the input dataset. We choose to take a fraction (e.g., 0.05) of data before and after each . Let

denote the empirical cumulative distribution function. Then, let

. Thus is centered on with fraction of the data. We fit the parameters on the sample . Note that for production values near zero and near the capacity, there could be as few as fraction of the values used.

To compute the estimation for a particular value , our method uses the interval for which is closest to and uses the corresponding set to compute the parameters for . For that are not close to zero or , the closest to will often be just . However, for very small or large values of and when , the use of the interval with the closest mean is most appropriate.

We will now describe how our method fits the parameters of the beta distributions. Because every estimated quantity will depend on , we drop as a subscript or function parameter for notational simplicity.

2.3 Fixing , and Estimating ,

2.3.1 Constraints on the Location Parameters

An informed choice of the location parameters will avoid simulating errors leading to values lower than or greater then the of the dataset. We now define the function , which returns the maximum possible simulated value at according to a conditional distribution . Because the inverse of the corresponding cumulative distribution function (CDF) evaluated at one, , is the maximum of the error simulated; is the minimum; and because we want to avoid simulating values above the cap or below zero we have

Similarly,

These two conditions give

Thus, we can define the estimators of the location parameters for each as:

2.3.2 Choosing the Shape Parameters by the Method of Moments

The mean and variance of a beta(

, , , ) distribution are:

We can now choose shape parameters by solving these two equations for and

to obtain and .

For any assign

2.4 Selecting

We now develop an empirical way to select the best . If is small, the sample on which to fit the distribution will be small since is small. Fitting a distribution on very little data is of course dangerous. On the other hand, if is large, then the sample is too large to provide us with an estimation of the conditional density. In the extreme where , every conditional density will be equal to the density of the relative error.

One way to select is to compute a discrepancy score between the empirical distribution function and the one obtained by estimating each conditional distributions with of the data. Let be the empirical joint density of . Let be the joint density of taken as . We choose to minimize the deviation between the real density and the simulated density:

3 Adjusting the conditional densities to fit a MARE target

We will use a tilde to specify the distributions and variables that we are simulating.

  • While is the random variable of the error with properties that can be estimated from , is the random variable of error defined by a distribution that we will develop with desired properties for the simulation.

  • We make use of three conditional distributions : the population density, , the estimated density , and a simulation density, .

We are now interested in modeling the conditional distribution of so that the expected relative absolute error of the simulated random variable is :

3.1 Adjusting the shape parameters so that it fits a target MAE

We want to adjust each conditional distribution so that the global distribution of satisfies the targeted MARE and so that they keep the same shape parameters as the original distributions. To do this we compute analytically the mean absolute error of a beta distribution when and are fixed. Let and . Let be an arbitrary beta density function with parameters for which we define a mean absolute error function of and given values for and as

We will make two remarks:

Since is continuous ( it is a sum of continuous functions ), the intermediate value theorem applies which means that can achieve any value and in particular, the value needed to in order to hit the specified error target.

Thus, once we are given , , and a target value for the absolute error at a particular value of

, we need to find the intersection between a hyperplane defined by the target and the surface defined by

to establish values for and . For we will want to choose the solution that minimize the distance to the estimated values and while hitting a target mean absolute error and without changing the shape parameters.

(1)
s.t.

However, in our case, since there are bound constraints on l and s (see section 2.3.1), cannot hit every target . We compute a maximum target function that can be hit as:

The target function must then be bounded for every by :

(2)

Given a mean absolute error target function satisfying inequality (2) we obtain for any , a beta distribution of parameters that satisfies the mean absolute error target and that is the closest possible to the estimated distribution. We now proceed to allocate an error target to each that we will call that depends on the target MARE and on a weight function.

3.2 Changing the conditional distributions

3.2.1 Weight functions

Let’s define as the set of functions defined on such that

We call them weight functions. Weight functions will be used to assign a target MAE to obtain from each of the conditional distributions , for all . It can also be seen as the function that weights the contribution of the Absolute Error of each conditional distribution to the Mean Absolute Relative Error of the simulation.

3.2.2 Target function generator

We also define the following functional that we call target function generator.

For a fixed and , is a target function. Since the target function will be used to directly adjust the conditional distribution, it must respect the inequality (2). Finally, we say that a target mare is feasible for a given if

3.2.3 Zero power input

We recall that the zero input does not count in the computation of the MARE. However, we want the distribution of the simulated errors to be drawn from the estimated distribution. In other words :

We assign

To avoid big discontinuities in the parameters of the beta distributions, we could take as , ,

3.2.4 Convergence to the requested MARE

Using the function to assign target MAE for each SID input will allow us to hit the targeted MARE using the simulation distribution. Indeed, let us define the random variable with density . If we establish the distribution parameters as described in Section 3.1 and solve program (1) with we have,

Then, the expected MARE with the errors drawn from these distributions and with the inputs in the is :

This is true with any weight function for which . We now proceed to describe the construction of a sensible weight function.

3.3 Weight function for

We recall the plausibility criteria: we want our simulations of errors to be as close as possible to the population distribution. In particular, suppose that we want to do a simulation with a target MARE that happens to be the same as the MARE for the original data ( and ) and further suppose that we want to simulate using values from the entire data set (i.e., ). Then we expect the simulated conditional distributions to be equal to the estimated conditional distributions. In other words,

Solving the linear program (

1) defined in subsection 3.1, leads to .

If we define the following function,

First, we can verify that we have . It is thus a weight function.

The choice of this weight function is natural when because it is the ratio of the expected relative error simulated at over the mean relative error when the errors are distributed according to the estimated joint distribution. However, choosing it when would satisfy our requirement for plausibility but it would prevent us from hitting the requested MARE.

Figure 3 illustrates that for the full CAISO wind dataset, the weight function presents a hyperbolic shape. The low values account for the biggest part of the MAPE.

Figure 3: ratio for the CAISO wind dataset.

3.4 Weight function for and an arbitrary

Let us define the following Real that we call the plausibility score :

When , the distribution of the SID is different from the distribution of the input dataset. Thus we do not necessarily have . A goal of our method is to meet the requested MARE, at least in expectation, no matter the . If is greater than 1, it means that the distribution of has more data in the range where the weight function takes high values. This means that if use , we are going to simulate too many errors with high values. While it has some physical sense, we are nonetheless going to simulate a greater MAPE than expected. Symmetrically, if is smaller than one, we are going to retrieve a lower MAPE than expected. This is illustrated in Figure 4), the density for the between December 2013 and March 2014 indicates more values at lower power than for the entire dataset, . If we simply used , for , then meeting the target AREs for each would result in a MARE much greater than specified. In other words, since the ARE/MARE ratio is very high for the low power input, and since these inputs are over represented under the distribution of the December 2013 - March 2014 SID, we are going to simulate too many errors with a high target of mean absolute error. To meet the target MARE, a re-scaled weight function must thus be computed.

Let us define the following SID weight function :

With the re-scaled factor, we have so .

Finally, for a given feasible , we compute a which allocates the absolute errors across based on the allocation from . With these two parameters we can compute . According to Section 3.2.4, defining from this target function, will get us .

We can also get the feasibility region for the target mare. For a given to be a feasible target mare, it must satisfy . Thus the feasibility region is :

Figure 4: Comparison test density versus all dataset density

3.5 Simulating Without Auto-correlation

This is now straightforward. First, to obtain a simulation of errors, we are simulating a vector identically and independently distributed uniformly on , . Then

Let,

so

While we are hitting the target mare, the entire auto-correlation of the errors simulated relies solely on the auto-correlation of the input. In the extreme case where the errors are not depending on the input i.e - which is the case for the middle power range for the CAISO wind data - then our simulations would have a null auto-correlation function. Implementing a base process to replace will generate the needed auto-correlation to satisfy the second point of the plausibility criteria.

4 Inferring a Base Process

The idea is to simulate a Base Process of marginal Uniform in [0,1] depending on the past p lags and the past q lags of errors over the base process . Then, as previously in section 3.5, we would simulate the errors via the transformation .

We model

as a Gaussian Process and more specifically as an ARMA process. Heuristically, we will show that this method gets us a good auto-correlation function for the simulations.

Inspired by the ARTA fit method (see [1]

). We denote the CDF for the standard normal distribution

and the CDF of the conditional distribution , which is a beta distribution fit using , . Let us define the following time-series :

We use the notation for the base process time-series of the dataset. Its empirical distribution is close to a standard Gaussian. Indeed, in section 2.3 we are estimating the conditional distribution so that has distribution that is approximated by , thus, and . We fit on this base process an ARMA process. The standard definition of an ARMA process of order and uses and as coefficients so we temporarily reuse those symbols in this section.

Definition 4.1.

is a base process if

  • follows an ARMA process of order and :

    Where are the iid Gaussian error of mean 0 and variance .

  • , , so that for all , .

We run a grid search over multiple and we select the ARMA model to minimize the BIC criterion. The and found during the process define a function that enables us to generate base processes which will create the auto-correlation that we are looking for; however, fitting an ARMA imposes no constraint on the variance of noise . So we are free to specify so that we get .

Then, we can simulate directly the error by

and also get the result for the expected MARE established in section 3.5.

5 Enforcing Curvature

Let denote an simulation output time series. We define curvature at a point in as

i.e., a second difference.

Methods described in Section 4 successfully model temporal correlation between the errors while satisfying a target MARE. However, some scenarios might not “look right” because of their lack of smooth curvature. This is especially unsatisfying in the case of forecasted renewables power production, which are much less sharp and erratic when compared to actual quantities.

(a) Illustration of forecast scenarios without curvature adjustment.
(b) Illustration of the same forecast scenarios with curvature adjustment

We now concretely illustrate this issue with analysis of the CAISO wind power production data introduced previously. In Figure 5(a), we show baseline scenarios resulting from our proposed methods, i.e., without adjustment for curvature. As is clearly observed, the simulated forecasts in this case closely mirror the actuals – and not the one “true” forecast. In contrast, we show in Figure 5(b) a closely related set simulated forecasts – obtained by the procedure we now describe – that instead exhibit significantly more smooth and realistic curvature. Ultimately, the need for such adjustment depends entirely on the application.

In order to adjust the curvature of a forecast while still acheiving a target MARE, one approach is to a posteriori adjust a time series that already satisfies a target MARE such that specific curvature characteristics are imposed. We now formalize this general approach.

We introduce a minimization problem in which we penalize deviations from both a target second difference and the simulated forecast error. Per earlier analysis, we can simulate using an ARMA base process. Then, define , and let and denote user inputs in .

We then let denote the solution of the following mathematical program:

(3)
s.t.

For practical computation, we now transform this mathematical program so that the objective function is quadratic and constraints are linear – such that widely available mathematical programming solvers can be leveraged. The transformation yields the following equivalent mixed-integer linear program (MILP), with additional variables, equality constraints, and inequality constraints ( if we consider that the three real vectors are negatively bounded by 0):

(4)
s.t.

where denotes a large constant; a safe value is .

To verify equivalence of the two mathematical programs, we note that if , then because , is equal to 0 with the two last equations. Then and . We use the same reasoning when .

Numerous open source and commercial solvers are available for such a mathematical program. However, solution time does generally increase with . In many applications the restrictions on curvature are motivated by aesthetic or heuristic considerations. Thus, it can be reasonable to specify a “loose” optimality gap to avoid excessive computation time.

6 Putting it all Together

In this section we summarize the process to deliver a simulation with correct targets.

6.1 Procedures for estimation

First, as shown in Algorithm 1, we preprocess the data and estimate the conditional distributions using the methods explained in section 2.3. This results in a set of beta distribution parameters for each input from the whole dataset called . To estimate the parameters we recall that the user should specify a data fraction (e.g., 0.05), for the sampling. (The software provides an option to produce a curve for the scores described in Section 2.4.)

1:x, y, a Input time-series and percent of data
2:
3:procedure Computing_Estimation_Parameters(, , )
4:     
5:     for  do Applying the methodology explained in section 2.3
6:         Compute the interval of estimation and sample .
7:         
8:          See section 2.3.1
9:          See section 2.3.2      
10:     for  do Take the closest computed point of estimation
11:         
12:               
13:     return
Algorithm 1 Estimating the beta distributions

Next, as shown in Algorithm 2, we estimate the partitioning of the mean absolute percent errors according to the input and we encode this information in the weight function. An important feature of this procedure is the computation of which is the expected mean absolute relative error from the conditional distributions (which may be close in value to, but is different from, .) This procedure is explained in section 3.3.

1:,
2:
3:procedure Computing_Estimated_Weight_Function(, )
4:     
5:     for  do Applying the methodology explained in section 3.3
6:          .
7:          See constraints on target function (2)
8:         
9:               
10:     
11:     
12:     return
Algorithm 2 Estimating the weight function

The next phase, shown in Algorithm 3, is estimation of the underlying base_process that generates auto-correlation in the time-series of the errors. This is done by using the CDF of the beta distribution whose parameters have been inferred in step 1. Then we operate a grid search over the and parameters to select the order of the model that minimize the BIC criterion. We save the coefficients. Recalling that we want the marginal of , we set the variance of the errors of the base process so that . This procedure is explained in section 4.

1:, ,
2:
3:procedure Fit_Arma_Process(, , )
4:     for  do Estimating the base process see section 4
5:               
6:     
7:     
8:     for  do Grid Searching
9:         
10:         if  then
11:              
12:                             
13:     
14:     
15:     return
Algorithm 3 Fitting the Base Process ARMA process

6.2 Procedures to deliver the target mare

First, as shown in Algorithm 4, given a target mare , and a we verify that is feasible. If it is, we aim at targeting a mean absolute error for each conditional distribution with input in the . For this we compute a target function using the estimated weight function (see section 3.4).

1:,
2:
3:procedure Computing_Simulation_Target_Function(, )
4:      Computing the Plausibility score
5:     for  do
6: