Developing a forecasting model is not a trivial task. Mostly, because there is no homogeneous method which achieves high accuracy in any scenario. Nevertheless, time series competitions such as the M-competitions (makridakis2018statistical; makridakis2018m4), have identified statistical and machine learning methods which have achieved consistent results. In most recent edition, the M-4 competition, makridakis2018m4 provides evidence that combinations of mostly statistical approaches are among the most accurate methods. The second most accurate method combines statistical methods using weighted average in which machine learning is applied to estimate the weights. The authors also argue that one must accept that all forecasting approaches and individual methods have both advantages and drawbacks. Is is also concluded that seasonal time series tend to be easier to predict.
The motivation of this paper and the cases study is wind speed forecasting, which comprises highly complex data. Although the presence of seasonality is generally observed, the time series can behave erratically. Furthermore, wind speed forecasting is crucial for the development of alternative energy sources, mainly wind farms. However, in order to integrate wind farms with the electrical energy system both estimates of the expected energy demand and the expected generated energy are required.
Successful wind speed forecasting models are based on machine learning approaches such as neural networks, fuzzy systems, support vector machine (SVM) and hybrid systems. Mostly, because of the complexity of the time series. For instance, the use of neural networks for wind time series forecasting is quite vastbilgili2007application; cadenas2009short; li2010comparing; haque2012new. In addition, standard statistical time series models are widely used for comparison, such as Autoregressive Moving Average (ARMA)alexiadis1998short, Autoregressive Integrated Moving Averagewang2004wind; chen2011comparison; cao2012forecasting; more2003forecasting
, Generalized Autoregressive Conditional Heteroscedastic (GARCH)guan2011short, among others. Pre-processing of the time series is also commonly found in the literature using Spectrum Analysis (SSA) souza2012artificial, wavelets junior2011analise; safavieh2007new; turbelin2009wavelet; bhaskar2010wind and Fourier analysis.
An alternative class of models, known as hybrid models aim to combine machine learning models with different methods. Examples of these methods are Focused Time Delay Neural Network (FTDNN), neural networks with fuzzy inputs hong2010hour, finite-impulse response neural networks (FIR-NN) barbounis2006locally, locally feedback dynamic fuzzy neural network (LF-DFNN), type recurrent fuzzy network (TRFN), neuro-fuzzy inference system (ANFIS) monfared2009new; potter2006very; pessanha2010previsao, ARIMA-RNA shi2012evaluation, ARIMA-SVM shi2012evaluation, among others.
This work proposes a new model which scans the time series searching for matching windows, i.e., parts of the past time series which are similar to the last observed values from which the prediction is required. From the selected windows, the following observations are used as forecast values using similarity functions. The method named as dynamic time scan forecasting (DTSF) is remarkably intuitive and can be used both as an exploratory tool to identify similar patterns in the time series and as an improved forecasting model. As compared to soft computing approaches, such as neural networks, the proposed method is extremely fast. Results using wind times series for a Brazilian power plant shows that the proposed method provides similar or improved prediction values as compared to statistical and machine learning models.
The proposed dynamic time scan forecasting model was inspired by scan statistics glaz2009scan. Scan statistics comprise a class of statistical methods which scan data streams in order to find anomalous behavior. It was introduced by Joseph Naus in 1945 naus1965distribution and extended to epidemiological surveillance using spatial kulldorff1997spatial, temporal and spatial-temporal data kulldorff2001prospective; kulldorff1998evaluating
. Briefly, a scanning window with fixed shape, such as a circular of a cylindrical window scans the spatial data, and a test statistic is calculated for each position of the window. The position with the largest value of the test statistic is a potential candidate for an anomalous behavior. Statistical inference is obtained using Monte Carlo simulationsmooney1997monte
under the null hypothesis that the data stream does not present anomalous data. Further information about scan statistics are found inglaz2009scan.
Similarly, the DTSF method scans a times series using a fixed window. The objective is to find previous windows in the data which are similar to the most recent observed values. Therefore, a test statistic, or a similarity statistic is calculated for each window. In addition, a similarity function is estimated for each window. After detecting most similar windows, forecasting values are estimated using the similarity functions and the observed values which follows the selected windows. Further details about DTSF are given below.
With respect to forecasting performance, the mean squared error (MSE)wang2004wind, mean relative error (MRE)wang2004wind, mean absolute percentage error (MAPE) chen2011comparison; cao2012forecasting; bilgili2007application; velo2014wind; senjyu2006application, root mean square error (RMSE) currie2014wind; velo2014wind; bechrakis2004wind are the most common performance statistics. makridakis2018statistical suggest using symmetric mean absolute percentage error (sMAPE) and model fitting (MF)
2 The dynamic time scan forecasting model
Let be a time series of length , . Initially, let vector be defined as the last observations of the time series:
The objective of DTSF comprises identifying patterns in the time series which are strongly correlated with vector . Thus, the set of candidate vectors can be written as:
where . The upper bound of the time sequence () guarantees that vector does not overlap with vector . Figure 1 illustrates the DTSF method. Given the last observed values, which comprises vector , a scanning window with the same size () is used to scan previous values of the time series.
The final goal of DTSF is to provide a k steps ahead forecast of the time series, . To achieve this goal, the DTSF scans the time series to find the best patterns . The values of the time series which follow the best patterns are used as the forecast values:
where is a function which correlates the elements of vector and the elements of vector .
A first constraint can be imposed on : . This constraint guarantees that if the most correlated time series window comprises the most recent values, prior to vector , then the forecast values are a function of vector ,
As observed in Equations (1) and (2), forecast values depend on the window length and function . A first proposal for function is a linear scaling of the elements of vector , i.e., a linear model. This is because previous values might supposedly be similar to the last values, except for a scale and/or offset shift. Thus, the method searches for values which might be similar to the last values, after applying a similarity function.
By assuming the similarity function as a linear model, the parameters of the model can be estimated to minimize the sum of squares between the elements of vector and the linear equation:
. Furthermore, the similarity statistic can be defined as the linear regression coefficient of determinationmontgomery2012introduction:
where is the -th value of vector and is the -th predicted value using the estimated linear function. It is worth mentioning that is within the unit interval [0-1]. If then estimated values are very close to the observed values, i.e., the past observed values located at time are similar to the last observed values after scaling and shift correction. The scanning procedure is illustrated in Figure 2 using a window of length 36 (hours). The past 7 windows with high similarity statistics () are indicated in rectangles. For each window a linear model (similarity function) was estimated.
Using the similarity functions, the follow up data from selected windows are used as forecast values, as illustrated in Figure 3. Point estimates are generated using an aggregation function such as the median values.
The DTSF method requires three parameters: the length of the scanning window, the similarity function specification and the number of best matches, i.e., the number of similar windows found in the time series. In order to improve computational speed, linear similarity functions are preferable such as linear, quadratic and cubic linear equations. The number of best matches can be selected dynamically using, for instance, a threshold for the similarity statistic. In this paper, a pre-defined grid of values for the length of the scanning window and the number of best matches is used. The final estimate is selected based on the minimum forecasting error in the previous day.
One important requirement of the proposed DTSF method is the availability of a large time series. The DTSF is a data driven method. Furthermore, the method can be applied to different time series using them as secondary data. For example, if wind times series are available in different weather stations across a geographical region then, in order to improve prediction in one specific location, the method can be used to select matches in neighboring weather stations. It is worth mentioning that the proposed DTSF method is available in the R package DTScanF marcelo_costa_2019_2603008.
3 Case study
The case study data comes from a wind power generation plant located in the south of the state of Bahia, in the northeastern region of Brazil. The company named Renova Energia (Renew Energy) owns the wind power plant and belongs to a major eletricity group named CEMIG. Wind speed (meter/second) data was collected in a measuring tower (torre Ventos do Nordeste), 78 meters high, at 10-minute intervals, from 21 November 2011 to 22 June 2016. The data was aggregated into 30-minute intervals using the mean function and has 80,398 observations.
Five days, within each weather season, were randomly selected from the final year of observations, in order to evaluate the forecasting methods. Table 1 shows the selected dates for each weather season.
The DTSF requires three parameters, the length of the scanning window, the similarity function and the number of best matches. These parameters were chosen based on prediction performance in the previous forecasting day. Four different similarity functions were evaluated: the linear function, , the quadratic function, , the cubic function, and the polynomial function of order 4, . In addition, both the length of the scanning window and the number of best matches were selected based on a grid of values. The investigated number of matches were 15, 25 and 50 matches. The investigated length of the scanning window were 24, 48, 72, 96 and 120 observations or 0.5, 1, 1.5 and 2 days, respectively. On total 60 different combinations of similarity functions, number of matches and length of the scanning window were investigated. The optimal parameters were chosen based on the observed mean absolute error in the previous day of forecasting.
Finally, the forecasting performance was evaluated using the , , and statistics, where is the observed value, is the forecast value and is the forecasting horizon.
3.1 Forecasting methods
Eight forecasting methods were also evaluated in order to predict the wind speed. The naïve method replicates the observed wind speed in the previous day, i.e., the last 48 observations, as the forecast values. The ARIMA, ETS (Exponential smoothing state space model), TBATS (Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components) models were implemented using the forecast package (robHyndman2008; robHyndman2017). The STL + ETS (STL: Seasonal Decomposition of Time Series by Loess) model was implemented using the stats (Rmanual)
package. The NNET 1 (Feed-forward neural networks with a single hidden layer and lagged inputs for forecasting univariate time series) model was also implemented using theforecast package. The NNET 2 model (Extreme learning machines for time series forecasting) was implemented using the nnfor package (Rnnfor). The hybrid (Hybrid time series modeling) was implemented using the forecastHybrid package (RforecastHybrid).
shows the case study time series. As mentioned, it comprises approximately 70,000 observations. The average speed is 8.65 m/s. The maximum and minimum values are 21.95 m/s and 0.25 m/s, respectively. The standard deviation is 3.16 m/s.
Table 2 shows the error statistics and the computing time for each method. Regarding computing time, the naïve approach is the fastest, followed by ETS, STL ETS, ARIMA and DTSF. Asterisks indicate methods which used a shorter time series in order to run. Both Hybrid and NNET.1 methods were not able to run using the total time series and, therefore, were fitted using only the last 365 days (one year) of data, or 17,520 observations prior to the forecasting day. Nevertheless, even though the Hybrid and NNET.1 methods used approximately 20% of the total database, their computing time was large. TBATS and NNET.2 presented the largest computing times using the complete time series. Regarding MAE, RMSE sMAPE and MF error statistics, the proposed DTSF achieved the minimum values, as shown in bold type, followed by NNET.2 (MAE), Hybrid (RMSE, MF and sMAPE) and TBATS (sMAPE). Figure 5(a) shows the MAE statistics and Figure 5(b)
shows the RMSE boxplots for each method. In general, the proposed DTSF presented the lower median value and the smaller quantile distance, i.e., the smaller dispersion.
Table 3 shows the average MAE statistics for each method and each forecasting days. The two methods with minimum MAE statistics in each day are highlighted. In general, the DTSF were among the two best predictions in 14 days (70%). In general, the remaining methods were among the best predictions in three days (15%), on average. The ETS method, were among the best predictions in five days, followed by NNET.1 and STL+ETS for four days. Interestingly, the naïve method achieved second best prediction in two days.
Figure 6 illustrates the location of the best matches for wind speed forecasting in October 26, 2015. The number of matches was estimated as 15, using the grid search procedure. It is worth noticing that the first match is located in 210.3 days before the end of the times series. The second match is located in 694.3 days before the end of the time series. The farthest match is located in 1273.35 days or 3.49 years before the end of the time series. As mentioned, after scaling and bias correction using the similarity function, the DTSF method is able to find matches which are necessarily not located close to the forecasting day.
Figure 7(a) shows the projected values of the 15 best matches using their respective similarity functions for October 26, 2015. The median value of the projections are used as the final forecast. In addition, the real wind speed is shown in Figure 7(a). It can be seen that most of the projected values shows a similar pattern as compared to the real wind speed. Furthermore, using a boxplot representation, prediction intervals can be estimated using the interquartile distance, as shown in Figure 7(b).
5 Discussion and conclusion
As described in the previous section, results indicate that the DTSF is extremely competitive, having achieved, on average, the best performance in the real case wind speed forecasting as compared to standard statistical and machine learning models. Furthermore, the DTSF is very intuitive and based on elementary statistical procedures, such and the scan statistics and the linear regression. One limitations of the DTSF is the requirement of a large database. For smaller time series, standard statistical models such as exponential smoothing and arima models are preferable.
The DTSF can also be applied to qualitative analysis. One may investigate the causes of similar patterns in a time series. Future work aims at combining DTSF, statistical and machine learning models in order to further improve prediction, as suggested by makridakis2018statistical. Ongoing work aims at comparing the DTSF performance using multiple time series data.
The authors thank CEMIG and CNPq for financial support, project numbers PQ-308361/2014-8, CNPq-402070/2016-0 and APQ-03813-12/GT555.