Déjà vu: forecasting with similarity

08/31/2019 ∙ by Yanfei Kang, et al. ∙ 0

Accurate forecasts are vital for supporting the decisions of modern companies. In order to improve statistical forecasting performance, forecasters typically select the most appropriate model for each data. However, statistical models presume a data generation process, while making strong distributional assumptions about the errors. In this paper, we present a new approach to time series forecasting that relaxes these assumptions. A target series is forecasted by identifying similar series from a reference set (déjà vu). Then, instead of extrapolating, the future paths of the similar reference series are aggregated and serve as the basis for the forecasts of the target series. Thus, forecasting with similarity is a data-centric approach that tackles model uncertainty without depending on statistical forecasting models. We evaluate the approach using a rich collection of real data and show that it results in good forecasting accuracy, especially for yearly series.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Effective forecasting is crucial for the various functions of modern companies. Forecasts are used to make decisions with regards to operations, finance, strategy, planning, and scheduling among others. Despite its importance, forecasting is not a straightforward task. The inherent uncertainty renders the provision of perfect forecasts impossible. Regardless, reducing the forecast error as much as possible is expected to translate to significant monetary savings.

We identify the search for an ‘optimal’ model as the main challenge to forecasting. Existing statistical forecasting models implicitly assume an underlying data generating process (DGP) coupled with distributional assumptions of the forecast errors that do not essentially hold in practice. Petropoulos et al. (2018) suggested that three sources of uncertainty exist in forecasting: model, parameter, and data. By exploring these, they found that merely tackling the model uncertainty is enough to bring most of the performance benefits. This results reconfirms George Box’s famous quote, “all models are wrong, but some are useful.” It is not surprising that researchers increasingly avoid using a single model and opt for combinations of forecasts from multiple models (Jose and Winkler, 2008; Kolassa, 2011; Bergmeir et al., 2016; Montero-Manso et al., 2019). We argue that there is another way to avoid selecting a single model: to select no model at all.

This study provides a new way to forecasting that does not require the estimation of any forecasting model, while also exploiting the benefits of cross-learning

(Makridakis et al., 2019). According to the proposed approach, a target series is compared against a set of reference series attempting to identify similar ones (déjà vu). Then, the forecasts for the target series is the average of the future paths of the most similar reference series. Note that no model extrapolation takes place. The proposed approach has a number of advantages compared to existing approaches, namely (i) it tackles both model and parameter uncertainty, (ii) it does not use time series features or other statistics as a proxy for determining similarity, and (iii) no explicit assumptions are made with regards to the DGP as well as the distribution of the forecast errors. The proposed forecasting approach is evaluated for the case of the M3 Competition data (Makridakis and Hibon, 2000) indicating promising results, especially when sufficient number of similar series are available.

The rest of the paper is organized as follows: in the next section we present an overview of the existing literature and provide our motivation behind forecasting with similarity. Section 3 describes the methodology for the proposed forecasting approach, while section 4 presents the empirical design and the results. Section 5 offers our discussions and insights as well as implications for research and practice. Finally, section 6 provides our concluding remarks.

2 Background research

2.1 Selecting a forecasting model

When dealing with numerous time series, forecasters typically try to enhance forecasting accuracy by selecting the most appropriate method from a set of alternatives. The solution might involve either aggregate selection, where a single method is used to extrapolate all the series, or individual selection, where the most appropriate method is used per case (Fildes, 1989). The latter approach can provide substantial improvements if forecasters were indeed in a position to select the best model (Fildes, 2001). Unfortunately, this is far from the reality due to the data, model, and parameter uncertainty present (Kourentzes et al., 2014; Petropoulos et al., 2018).

In this respect, individual selection becomes a complicated problem and forecasters have to balance the potential gains in forecasting accuracy over the additional complexity introduced. Automatic forecasting algorithms test multiple forecasting methods and select the ‘best’ of these based on some criterion. Such criteria include information criteria, e.g., likelihood of a model penalised by its complexity (Hyndman et al., 2002; Hyndman and Khandakar, 2008), or forecasting performance on past windows of the data (Tashman, 2000). Other approaches to model selection involve discriminant analysis (Shah, 1997), time series features (Petropoulos et al., 2014), and expert rules (Adya et al., 2001). An interesting alternative is to apply cross-learning in a sense that the series are grouped according to their similarity on an array of features and the best method is selected for their extrapolation (Kang et al., 2017; Spiliotis et al., 2019b).

In any case, the difference between two models might be small and the selection of one over the other might be purely due to chance. This also results in different models being selected when different criteria or cost functions are used (Billah et al., 2006). Moreover, the features and the rules considered may not be adequate for describing every possible pattern of data. As a result, in most cases a clear-cut ‘best’ model does not exist with all models simply being rough approximations of the reality.

2.2 The non-existence of a DGP and forecast combinations

Time series models that are usually offered by the off-the-self forecasting software have over-simplified assumptions (such as the normality of the residuals and stationarity), which do not essentially hold in practice. As a result, it is impossible for such models to perfectly capture the actual DGP of the data. One could work towards defining a complex multivariate model (Svetunkov, 2016), but this would lead to all kinds of new problems, such as data limitations and the inability to accurately forecast some of the exogenous variables identified as important.

As a solution to the above problem, forecasting researchers have been combining the forecasts from different models (Bates and Granger, 1969; Clemen, 1989; Makridakis and Winkler, 1983; Timmermann, 2006; Claeskens et al., 2016)

. The main advantage of combination is that it reduces the uncertainty related with model and parameter determination, decreasing that way the risk of selecting a single inadequate model. Moreover, combining different methods enables capturing multiple patterns. Thus, forecast combinations lead to more accurate and robust forecasts with lower error variances

(Hibon and Evgeniou, 2005).

Through the years, the forecast combination puzzle (Claeskens et al., 2016), i.e., the fact that optimal weights often perform poorly in applications, has been both theoretically and empirically examined. Many alternatives have been proposed to exploit the benefits of combining, including among others Akakie’s weights (Kolassa, 2011), temporal aggregation levels (Kourentzes et al., 2014), bagging (Bergmeir et al., 2016; Petropoulos et al., 2018), and hierarchies (Hyndman et al., 2011; Athanasopoulos et al., 2017). In any case, simple combinations have shown to perform well in practice (Petropoulos and Svetunkov, 2019).

Regardless the improved performance offered by forecast combinations, combining still displays some disadvantages, the primary ones being (i) the determination of the pool of methods being averaged, (ii) the identification of their weights, and (iii) the estimation of multiple methods.

2.3 Forecasting with similar series

An alternative to fitting statistical models to the historical data would be to explore if similar patterns have appeared in the past. The motivation behind this argument originates from the work on structured analogies by Green and Armstrong (2007). Structured analogies is a framework for eliciting human judgment in forecasting. Given a forecasting challenge, a panel of experts is assembled and asked to independently and anonymously provide a list of analogies that are similar to the target problem together with the degree of similarity and their outcomes. A facilitator calculates the forecast for the target situation as the average of the outcomes of the analogous cases weighted by the degree of their similarity.

While several modifications have been proposed in the literature, the core of the framework is the one described above. Such an approach is practical in cases that no historical data are available for the current situation (e.g., Nikolopoulos et al., 2015), which renders the application of statistical algorithms impossible. Forecasting by analogy has been also used extensively in tasks related to new product forecasting in order to estimate the demand and the life-cycle curve parameters based on the historical demands and life-cycles of similar products.

Even when historical information is available, sharing information across series has shown to improve the forecasting performance. A series of studies attempted to estimate the seasonality on a group level instead of a series level (e.g., Mohammadipour and Boylan, 2012; Zhang et al., 2013; Boylan et al., 2014). When series are arranged in hierarchies, products that belong into the same category is possible to have similarities in their seasonal patterns. This renders their estimation on an aggregate level more accurate, especially for the shorter series where few seasonal cycles are available.

The use of cross-sectional information for time series forecasting tasks has also been one of the features of the two best performing approaches in the recent M4 forecasting competition (Makridakis et al., 2019). Smyl (2019)

proposed a hybrid approach that combines exponential smoothing with neural networks. The hierarchical estimation of the parameters utilises learning across series but also focuses on the idiosyncrasies of each individual series.

Montero-Manso et al. (2019) used cross-learning based on the similarity of the features in collections of series to estimate the combination weights assigned to a pool of forecasting methods.

Nikolopoulos et al. (2016) explored the value of identifying similar patterns within series of intermittent nature (where the demand for some periods is zero). They proposed an approach that uses nearest neighbours in order to predict incomplete series of consecutive periods with non-zero demands based on past occurrences of non-zero demands. This is the first, to our knowledge, statistical approach to directly use similar observed instances from the past to predict future outcomes. We suggest that searching for similar patterns can be extended from within series to across series but also from intermittent to fast-moving demand data.

3 Methodology

Given a rich and diverse set of reference series, the objective of this approach is to find the most similar ones to a target series, average their future paths, and use this average as the forecast for the target series. We assume that the target series, , has a length of observations and a forecasting horizon of . Series in the reference set shorter than are not considered. Series longer than are truncated keeping the last values, where the first values are used for measuring similarity and the last values serve as a future path. We end up with a matrix, , of reference values of order so that each row represents the values of a (truncated) reference series, and is the number of the reference series. A particular reference series is denoted with , where , is the historical data and represents the future path. The approach consists of the following steps.

  1. Remove seasonality, if a series is identified as seasonal.

  2. Smoothing by estimating the trend component through time series decomposition.

  3. Scaling to render the target and possible similar series comparable.

  4. Measuring similarity by using a set of distance measures.

  5. Forecasting by aggregating the paths of the most similar series.

  6. Inverse scaling to bring the forecasts for the target series back to its original scale.

  7. Add seasonality, if the target series was found seasonal in step 1.

In the following subsections, we describe these steps in detail. Section 3.1 describes the preprocessing of the data (steps 1, 2, 3, 6, and 7) while section 3.2 provides the details regarding the similarity measurement and forecasting (steps 4 and 5).

3.1 Preprocessing

When dealing with diverse data, preprocessing becomes essential for effectively forecasting with similarity. This is because the process of identifying similar series is complicated when multiple seasonal patterns and randomness are present, as well as when the scales of the series being compared differ. If the reference series are not representative of the target series or the size of the reference set is small, the chances of observing similar patterns are further decreased.

In order to deal with this problem, we consider three steps which are applied sequentially. The first one removes the seasonality, if present. By doing so, the target series is more likely to be effectively matched with multiple reference ones, at least if the dissimilarities present are attributed to different seasonal patterns. In the second step, we smooth the seasonally adjusted series to remove randomness and possible outliers from the data which, once again, complicate the process and reduce the chances of identifying many similar series. Finally, we scale the target and the reference series, so that their values are directly comparable. Preprocessing is applied both to the reference and target series.

3.1.1 Seasonal adjustment

Seasonal adjustment is performed by utilizing the “Seasonal and Trend decomposition using Loess” (STL) method, as presented by Cleveland et al. (1990) and implemented in the stats package for R. In brief, STL decomposes the series into the trend, seasonal, and remainder components, assuming an additive interaction between them. An adjustment is considered only if the series is identified as seasonal, which is determined through a seasonality test that checks for autocorrelation significance on the term of the ACF, where is the frequency of the series (e.g., for monthly data). Thus, given a series of observations, frequency , and a confidence level of 90%, a seasonal adjustment is considered only if

(1)

where is equal to and for the target and the reference series, respectively. Non-seasonal series () and series where observations are fewer in number than three seasonal periods are not tested and assumed as not being seasonal.

Given that some series may display multiplicative seasonality, the Box-Cox transformation (Box and Cox, 1964) is applied to the data before STL to effectively estimate the seasonal component in both cases (Bergmeir et al., 2016). The Box-Cox transformation is defined as

(2)

where

is a vector and

is selected using the method of Guerrero (1993), as implemented in the forecast package for R (Hyndman et al., 2019). Note that after removing the seasonal component from the transformed series, the inverse transformation is applied to the rest of the components (sum of the trend and remainder) to obtain the seasonally adjusted one.

As the forecasts produced by the seasonally adjusted data will not be seasonal, we need to reseasonalise them (step 7). Given that the seasonal component removed is Box-Cox transformed, the forecasts must be also transformed using the same calculated earlier. Having added the seasonality on the transformed forecasts, a final inverse transformation is applied.

3.1.2 Smoothing

Smoothing is performed by utilizing the Loess method, as presented by Cleveland et al. (1992) and implemented in the stats package for R. In short, a local model is computed, with the fit at point being the weighted average of the neighbourhood points and the weights being proportional to the distances observed between the neighbors and point . Similarly to STL, Loess decomposes the series into the trend and remainder components. Thus, by using the trend component, outliers and noise are effectively removed and it is easier to find similar series. Moreover, smoothing can help us obtain a more representative forecast origin (last historical value of the series), potentially improving forecasting accuracy (Spiliotis et al., 2019a).

3.1.3 Scaling

Scaling refers to translating the target and the reference series at the same levels so that they are comparable to each other. This process can be done with various ways, such as dividing each value of a time series by a simple summary statistic (max, min, mean, etc.), by restricting the values within a specific range (such as in ), or by applying a standard score. Since the forecast origin is the most important observation in terms of forecasting, we divide each point by this specific value. A similar approach has been successfully applied by Smyl (2019). A different scaling would be needed if either the target or the reference series contain zero values. Finally, inverse scaling is needed once the forecasts have been produced to return to the original level of the target series (step 6). This is achieved via multiplying each forecast by the forecast origin.

3.2 Similarity & forecasting

One disadvantage of forecasting using a statistical model is that a DGP is explicitly assumed, although it might be difficult or even impossible to capture in practice. On the other hand, the proposed methodology searches in a set of reference series to identify similar patterns to those of the target series we need to forecast.

Given the preprocessed target series, , and the preprocessed reference series, , we search for similar series as follows: For each series, , in the reference set, , we calculate the distance between its historical values, , and the ones of the target series using a distance measure. The result of this process is a vector of distances that corresponds to pairs of the target and the reference series available.

In terms of measuring the distance, we consider three alternatives. The first one is the norm which is equivalent to the sum of the absolute deviations between and . The second measure is the norm (Euclidean distance) which is equivalent to the square root of the sum of the squared deviations. The third alternative involves the utilization of Dynamic Time Warping (DTW), which is an algorithm for identifying alternative alignments between the points of two series, so that their total distance is minimized. In contrast to the previous two measures, DTW allows various matches among the points of the series being compared, meaning that can be matched either with , as done with and , or with previous/following points of , even if these points have been already used in other matches. Although some restrictions are still present when employing DTW, it does introduce more flexibility to the process, allowing the identification of similar series that may display differences when examined locally.

The three distance measures are formally expressed as

(3)
(4)
(5)

where is computed recursively as

(6)

Equation 6 returns the total variation of two vectors, and . Note that DTW assumes a mapping path from to and an initial condition of .

Having computed the distances between and , a subset of reference series is chosen for aggregating their future paths and, therefore, forecasting the target series. This is done by selecting the most similar series, i.e., the series that display the smaller distances, as determined by the selected measure. In our experiment we consider various values to investigate the effect of pool size on forecasting accuracy but demonstrate that any value greater than 100 is a good choice.

Essentially, we propose that the future paths from the most similar series can form the basis for calculating the forecasts for the target series. Indeed, we do so by considering statistical aggregation of these future paths, where the median is calculated for each planning horizon. This approach is appealing in the sense that it does not involve statistical forecasting in the traditional way: fitting statistical models and extrapolating patterns. Instead, the real outcomes of a set of similar series are used to derive the forecasts.

The proposed forecasting approach is demonstrated via a toy example, visualized in Figure 1. The plot on the top presents the original target series, as well as the seasonally adjusted and smoothed one. The plot in the middle presents the preprocessed series (scaled values) together with the 100 most similar reference series used for extrapolation. Finally, the plot at the bottom compares the rescaled and reseasonalised forecasts to the actual future values of the target series.

Figure 1: A toy example visualizing the methodology proposed for forecasting with similarity. First, the target series is seasonally adjusted and smoothed (plot on the top). Then, the series is scaled and similar reference series are used to determine its future path through aggregation (plot in the middle). Finally, the computed forecast is rescaled and reseasonalised to obtain the final forecast. The M495 series of the M3 Competition data set is used as the target series. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)

4 Evaluation

4.1 Design

In this paper, we aim to forecast the yearly, quarterly, and monthly series of the M3 forecasting competition (Makridakis and Hibon, 2000). This is a widely used data set in the forecasting literature with the corresponding research paper having been cited more than times according to Google Scholar (as of 22/08/2019). The number of the yearly, quarterly, and monthly series is presented in table 1, together with a five-number summary of their lengths and the forecast horizon per frequency.

Frequency Number of series Historical observations
Min Q1 Q2 Q3 Max
Yearly 645 14 15 19 30 41 6
Quarterly 756 16 36 44 44 64 8
Monthly 1428 48 78 115 116 126 18
Total 2829
Table 1: The number of the target series, their lengths, and the forecasting horizon for each data frequency.

In order to assess the impact of the series length, we produce forecasts not only using all the available history for each target series, but also considering shorter historical samples by truncating the long series and keeping the last few years of their history. This is of particular interest for forecasting practice as in many enterprise resource planning systems, such as SAP, only a limited number of years is usually available. Table 2 shows the cuts considered per frequency.

Frequency Up to (in years)
Yearly 6 10 14 18 22 26 30 34
Quarterly 3 4 5 6 7 8 9 10
Monthly 3 4 5 6 7 8 9 10
Table 2: The cuts of the target series considered.

In order to implement the approach based on similarity described in the previous section, we need a rich and diverse enough set of reference series. For this purpose, we use the yearly, quarterly, and monthly subsets of the M4 competition (Makridakis et al., 2019), which consist of , , and series, respectively. The lengths of these series are, on average, higher that the lengths of the M3 series with the median values being 29, 88, and 202 for the yearly, quarterly, and monthly frequencies, respectively.

The forecast accuracy is measured in terms of the Mean Absolute Scaled Error (MASE: Hyndman and Koehler, 2006). MASE is a scaled version of the mean absolute error, with the scaling being the mean absolute error of the seasonal naive for the historical data. It is widely accepted in the forecasting literature (see, for example, Franses, 2016). Makridakis et al. (2019) also used this measure to evaluate the point forecasts of the submitting entries for the M4 forecasting competition. Across all horizons of a single series, the MASE can be calculated as

(7)

where and are the actual observation and the forecast for period , is the sample size, is the length of the seasonal period, and is the forecasting horizon. The MASE is scale-independent, so averaging across series is appropriate. Lower MASE values are better.

4.2 Investigating the performance of forecasting with similarity

In this section, we focus on the performance of forecasting with similarity and explore the different settings, such as the choice of the distance measure, the pool size of similar reference series (number of aggregates, ), as well as the effect of preprocessing. Once the optimal settings are identified, in the next subsection we compare the performance of our proposition against that of two robust benchmarks for various sizes of the historical sample.

Table 3 presents the MASE results of forecasting with similarity for each data frequency separately as well as across all frequencies (Total). The summary across frequencies is a weighted average based on the series counts for each frequency. Moreover, we present the results for each distance measure (, , and DTW) in rows and for various values of in columns.

Frequency Distance Measure Number of aggregated reference series ()
1 5 10 50 100 500 1000
Yearly 3.289 2.837 2.787 2.689 2.668 2.632 2.634
3.333 2.866 2.785 2.703 2.684 2.638 2.639
DTW 3.270 2.835 2.730 2.656 2.641 2.623 2.637
Quarterly 1.312 1.205 1.175 1.136 1.135 1.127 1.126
1.336 1.199 1.162 1.138 1.134 1.126 1.127
DTW 1.293 1.177 1.158 1.117 1.115 1.115 1.116
Monthly 1.004 0.908 0.887 0.871 0.870 0.867 0.869
1.008 0.910 0.891 0.871 0.869 0.866 0.868
DTW 1.001 0.895 0.875 0.861 0.861 0.857 0.857
Total 1.607 1.427 1.397 1.356 1.351 1.339 1.340
1.626 1.433 1.395 1.360 1.354 1.339 1.341
DTW 1.597 1.413 1.373 1.339 1.335 1.329 1.332
Table 3: The performance of the forecasting with similarity approach for different distance measures and pool sizes of similar reference series ().

A comparison across the different values for the number of reference series, , suggests that large pools of representative series provide better performance. At the same time, the improvements seem to tapper off when . Based on the reference set we use in this study, we identify a sweet point at . The analysis presented on the next subsection focuses on this aggregate size. In any case, we believe that the value of should be selected based on the set of the reference series considered and its homogeneity with the target series.

Table 3 also shows that and perform almost indistinguishable across all frequencies. DTW outperforms almost always the other two distance measures. However, the differences are small, to the degree of . Given that the DTW is more computationally intensive than and (approximately , , and for yearly, quarterly, and monthly frequencies, respectively), we need to investigate if the achieved performance improvements are statistically significant.

To this end, we apply the Multiple Comparisons from the Best (MCB) test that compares if the average (across series) ranking of each distance is significantly different than the others (for more details on the MCB, please see Koning et al. (2005)). When the intervals of two methods overlap, then their ranked performances are not statistically different. The analysis is done for . The results are presented in figure 2. We observe that DTW results in the best ranked performance which, however, is not statistically different to that of the other two distance measures. We argue that if computational cost is a concern, one should choose between and . Otherwise, DTW is better, both in terms of average forecast accuracy and mean ranks. In the analysis below, we focus on the DTW distance measure.

Figure 2: MCB significance tests for the three distance measures for each data frequency.

The results presented above are based on applying preprocessing (as described in section 3.1) before searching for similar series. Here, we investigate the added value of preprocessing to the forecasting with similarity approach. Table 4 presents the MASE results for DTW across different values with and without preprocessing. Preprocessing does not appear to improve the forecast accuracy at the yearly frequency. On the contrary, preprocessing is of great importance for the seasonal data (quarterly and monthly) with the drop in the values of MASE being substantial. The difference between the yearly and the other frequencies can be explained by the lack of seasonal patterns in the former case which allows for easier identification of similar series. Regardless, preprocessing always provides the same or better accuracy, so we suggest that it is always applied when forecasting with similar series.

Frequency Preprocessing Number of aggregated reference series ()
1 5 10 50 100 500 1000
Yearly NO 3.544 2.821 2.735 2.644 2.639 2.626 2.641
YES 3.270 2.835 2.730 2.656 2.641 2.623 2.637
Quarterly NO 1.657 1.411 1.359 1.384 1.396 1.419 1.422
YES 1.293 1.177 1.158 1.117 1.115 1.115 1.116
Monthly NO 1.263 1.077 1.020 1.011 1.012 1.040 1.060
YES 1.001 0.895 0.875 0.861 0.861 0.857 0.857
Total NO 1.888 1.564 1.502 1.483 1.486 1.503 1.517
YES 1.597 1.413 1.373 1.339 1.335 1.329 1.332
Table 4: The performance of forecasting with similarity, with and without preprocessing. The DTW distance measure is considered.

4.3 Similarity versus model-based forecasts

Having identified the optimal settings for forecasting with similarity (DTW, , and preprocessing), abbreviated from now on simply as Similarity, in this subsection we turn our attention on comparing the accuracy of our approach against well-known forecasting benchmarks. We use two benchmark methods. The forecasts of the first method derive from the optimally selected exponential smoothing model when applying selection with the corrected (for small sample sizes) Akakie’s Infomration Criterion (). This optimal selection occurs per series individually, so different optimal models may be selected for different series. We use the implementation available in the forecast package for the R statistical software and in particular the ets() function (Hyndman and Khandakar, 2008). The second benchmark is the simple (equally-weighted) combination of three exponential smoothing models: Simple Exponential Smoothing, Holt’s linear trend Exponential Smoothing, and Damped trend Exponential Smoothing. This combination is applied on the seasonally adjusted data (multiplicative classical decomposition), if the data were found to be seasonal. The seasonality test that is described in section 3.1.1 is applied. This combination approach has been used as a benchmark in international forecasting competitions (Makridakis and Hibon, 2000; Makridakis et al., 2019) and it is usually abbreviated as SHD.

Figure 3 shows the accuracy of Similarity against the two benchmarks, ETS and SHD. The comparison is done for various historical sample sizes to examine the effect of data availability. We observe:

  • In the yearly frequency, Similarity always outperforms the two benchmarks regardless the length of the available history. Interestingly, ETS improves when not all available observations are used for model fitting (truncated target series). In fact, using just 14 years of historical sample gives the best accuracy in the yearly frequency for ETS. SHD and Similarity perform better when more data are available.

  • In the quarterly frequency, Similarity is again overall better than the two benchmarks. The only exception is when the series are extremely short (3 years of history) where ETS outperforms Similarity. Finally, for long series the performance of SHD is close to that of Similarity.

  • In the monthly frequency, ETS is better than Similarity, which is better than SHD. This is especially true for short histories. The performance difference of all three approaches is indistinguishable when longer series are available. Lengthier monthly series generally result in improved performance up to a point: if more than 7 or 8 years of data are available, then the changes in forecasting accuracy are small.

Figure 3: Benchmarking the performance of Similarity against ETS and SHD for various historical sample sizes. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)

Figure 3 also shows the performance of the simple forecast combination of ETS and Similarity (“ETS-Similarity”)111Other simple combinations of ETS, SHD, and Similarity were also tested, having on average same or worse performance to the ETS-Similarity simple combination.. The argument is that these two forecasting approaches are diverse in nature (model-based versus data-centric) but also robust when applied separately, so we expect that their combination will also perform well (Lichtendahl and Winkler, 2019). We observe that this simple combination performs on par to Similarity for the yearly frequency, being much better than any other approach at the seasonal frequencies. Overall, the simple combination of ETS-Similarity is the best approach. This suggests that there are benefits for both model-based and data-centric approaches for forecasting. Simply focusing on the one or the other might not be ideal.

Finally, we compare the differences in the ranked performance of the three approaches (ETS, SHD, and Similarity) and the one combination (ETS-Similarity) in terms of their statistical significance (MCB). The results are presented in the nine panels of figure 4 for each frequency (in rows) and short, medium, and long historical samples (in columns). We observe:

  • Similarity is significantly better than ETS and SHD for the short and long yearly series. At the same time, Similarity performs statistically similar to ETS and SHD for the other frequencies.

  • A simple combination of ETS and Similarity is ranked always 1st. Moreover, its performance is statistically sigificant better to ETS and SHD for all frequencies and historical sample sizes (their intervals do not overlap). Similarity and ETS-Similarity are not statistically different at the yearly frequency, but the combination approach is better at the seasonal ones.

Figure 4: MCB significance tests for ETS, SHD, Similarity, and ETS-Similarity for each data frequency and various sample sizes.

5 Discussions

Statistical time series forecasting typically involves selecting or combining the most accurate forecasting model(s) per series, a complicated task which is significantly affected by data, model and parameter uncertainty. On the other hand, nowadays Big Data allows forecasters to improve forecasting accuracy through cross-learning, i.e., by extracting information from multiple series of similar characteristics. This practice has been proven highly promising, especially through the exploitation of advanced Machine Learning algorithms and fast computers

(Makridakis et al., 2019). Our results confirm that data-centric solutions offer a lot of advantages over traditional model-based ones, relaxing the assumptions made by the models, while also allowing for more flexibility. Thus, we believe that extending forecasting from within series to across series, is a promising alternative to forecasting.

An important advancement of our forecasting approach over other cross-learning ones, is that similarity derives directly from the data, not depending on the extraction of a feature vector that indirectly summarizes the characteristics of the series (Petropoulos et al., 2014; Kang et al., 2017, 2019). To this end, the uncertainty related to the choice and the definition of the features used for matching the target to the reference series is effectively mitigated. Moreover, no explicit rules are required for determining what kind of statistical forecasting model(s) should be used per case (Montero-Manso et al., 2019). Instead of specifying a pool of forecasting models and an algorithm for assigning these models to the series, a distance measure is defined and exploited for evaluating similarity. Finally, forecasting models are replaced by the true future paths of the similar reference series.

Our results are significant for the practice of Operational Research (OR) and Operations Management (OM) with more accurate forecasts translating to better decisions. Forecasting is an important driver to reducing inventory associated costs and waste in supply chains (for a comprehensive review on supply chain forecasting, see Syntetos et al., 2016). In fact, small improvements in forecast accuracy are usually amplified in terms of the inventory utility, namely inventory holding and achieved target service levels (Syntetos et al., 2010). At the same time, forecast accuracy is also important to other areas of OR, such as humanitarian operations and logistics (Rodríguez-Espíndola et al., 2018; Kovacs and Moshtari, 2019) and healthcare management (Brailsford and Vissers, 2011; Willis et al., 2018).

Our study has also implications for software providers of forecasting support systems. We offer our code as an open-source solution together with a web interface222Available here: https://fotpetr.shinyapps.io/similarity/ (developed in R and Shiny) where a target series can be forecasted through similarity, as described in section 3, using the large M4 competition data set as the reference set. We argue that our approach is straightforward to be implemented within existing solutions, offering a competitive alternative to the traditional statistical modelling. Forecasting with similarity can expand the existing toolboxes of forecasting software. Given that none approach is the best for all cases, a selection framework (such as time series cross-validation) can optimally pick between statistical models or forecasting with similarity based on past forecasting performance.

However, computational time is a critical factor that should be carefully taken into consideration, especially when forecasting massive data collections. This is particularly true in supply chain management where millions of item-level forecasts must be produced on daily basis (Seaman, 2018). Since the DTW distance measure is more computationally intensive than the two other measures presented in this study, an option would be to select between them based on the results of an ABC-XYZ analysis (Ramanathan, 2006)

. This analysis is based on the Pareto principle (the 80/20 rule), i.e. the expectation that the minority of cases has a disproportional impact to the whole. In this respect, the target series could be first classified as A, B, or C, according to their importance/cost, and as X, Y, or Z, based on how difficult it is to be accurately forecasted. Then, series in the AZ class (important but difficult to forecast) could be forecasted using DTW, while the rest using another, less computationally intensive distance measure.

Forecasting with similarity is based on the availability of a rich collection of reference series. In order to have an appealing forecasting performance, such reference dataset should be as representative (see Kang et al. (2019) for a more rigorous definition) as possible to the target series, which is easy to achieve in business cycles because of data accumulation. To illustrate and empirically demonstrate the effectiveness of the approach, we used the M4 competition data set as a reference. This data set is considered to appropriately represent the reality (Spiliotis et al., 2019b). However, if our approach was to be applied on the data of a specific company or sector, then it would make sense that the reference set was derived from data of that company/sector so as to be as representative as possible. In the case that it is difficult to identify appropriate reference series for the target series, then generating series with the desirable characteristics (Kang et al., 2019) is an option.

We have empirically tested our approach on three representative data frequencies: yearly, quarterly, and monthly. We have no reasons to believe that our approach would not perform well for higher frequency data, such as weekly, daily, or hourly. If multiple seasonal patterns appear, as it could be the case for the hourly frequency with periodicity within a day (every 24 hours) and within a week (every 168 hours), then a multiple seasonal decomposition needs to be applied instead of the standard STL (the forecast package for R offers the mstl() function for this purpose). On the other hand, our approach is not suitable as is for intermittent demand data, where the demand values for several periods are equal to zero. In this case, one could try forecasting with similarity without applying data preprocessing. A similar approach was proposed by Nikolopoulos et al. (2016) who focused on identifying patterns within intermittent demand series rather than across series.

6 Concluding remarks

In this paper, we introduced a new approach to forecasting that uses the future paths of similar reference series to forecast a target series. The advantages of our proposition is that it is model-free, in the sense that it does not rely on statistical forecasting models, and, as a result, it does not assume an explicit DGP. Instead, we argue that history repeats itself (déjà vu) and that the current data patterns will resemble the patterns of other already observed series. The proposed approach is data-centric and relies on the availability of a rich, representative reference set of series – a not so unreasonable requirement in the era of Big Data.

We examined the performance of the new approach on a widely-used data set and benchmarked it against two robust forecasting methods, namely the automatic selection of the best model from the Exponential Smoothing family (ETS) and the equal-weighted combination of Simple, Holt, and Damped exponential smoothing (SHD). We found that in most frequencies the new approach is more accurate than the benchmarks. Moreover, we proposed a simple combination of model-based and model-free forecasts which results in accuracy that is always significantly better than the one or the other separately.

The innovative proposition of forecasting with similarity and without models points towards several future research paths. First, in this paper we focused on the point forecast accuracy. One obvious extension would be the evaluation of the produced forecasts in terms of uncertainty by either focusing on specific prediction intervals or the entire forecast distribution. The future paths of the similar series serve as a good basis for extending our proposition to this direction. Furthermore, in this study we did not differentiate the reference series as to match the industry/field of the target series. It would be interesting to explore if such a matching would further improve the accuracy of forecasting with similarity.

References

  • M. Adya, F. Collopy, J. Armstrong, and M. Kennedy (2001) Automatic identification of time series features for rule-based forecasting. International Journal of Forecasting 17 (2), pp. 143–157. External Links: ISSN 0169-2070, Document, Link Cited by: §2.1.
  • G. Athanasopoulos, R. J. Hyndman, N. Kourentzes, and F. Petropoulos (2017) Forecasting with temporal hierarchies. European Journal of Operational Research 262 (1), pp. 60–74. External Links: ISSN 0377-2217, Document, Link Cited by: §2.2.
  • J. M. Bates and C. W. J. Granger (1969) The combination of forecasts. Operational Research Society 20 (4), pp. 451–468. Cited by: §2.2.
  • C. Bergmeir, R. J. Hyndman, and J. M. Benítez (2016) Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation. International Journal of Forecasting 32 (2), pp. 303–312. External Links: ISSN 0169-2070, Document Cited by: §1, §2.2, §3.1.1.
  • B. Billah, M. L. King, R. Snyder, and A. B. Koehler (2006) Exponential smoothing model selection for forecasting. International Journal of Forecasting 22 (2), pp. 239–247. External Links: ISSN 0169-2070, Document Cited by: §2.1.
  • G. E. P. Box and D. R. Cox (1964) An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological) 26 (2), pp. 211–252. External Links: ISSN 00359246, Link Cited by: §3.1.1.
  • J. E. Boylan, H. Chen, M. Mohammadipour, and A. Syntetos (2014) Formation of seasonal groups and application of seasonal indices. The Journal of the Operational Research Society 65 (2), pp. 227–241. External Links: ISSN 0160-5682, Document Cited by: §2.3.
  • S. Brailsford and J. Vissers (2011) OR in healthcare: a European perspective. European Journal of Operational Research 212 (2), pp. 223–234. External Links: ISSN 0377-2217, Document Cited by: §5.
  • G. Claeskens, J. R. Magnus, A. L. Vasnev, and W. Wang (2016) The forecast combination puzzle: a simple theoretical explanation. International Journal of Forecasting 32 (3), pp. 754–762. Cited by: §2.2, §2.2.
  • R. T. Clemen (1989) Combining forecasts: a review and annotated bibliography. International Journal of Forecasting 5, pp. 559–583. Cited by: §2.2.
  • R.B. Cleveland, W.S. Cleveland, J. McRae, and I. Terpenning (1990) STL: a seasonal-trend decomposition procedure based on loess. Journal of Official Statistics 6, pp. 3–73. Cited by: §3.1.1.
  • W. S. Cleveland, E. Grosse, and W. M. Shyu (1992) Local regression models. In Statistical Models in S, pp. 68. Cited by: §3.1.2.
  • R. Fildes (1989) Evaluation of aggregate and individual forecast method selection rules. Management Science 35 (9), pp. 1056–1065. Cited by: §2.1.
  • R. Fildes (2001) Beyond forecasting competitions. International Journal of Forecasting 17 (4), pp. 556–560. Cited by: §2.1.
  • P. H. Franses (2016) A note on the mean absolute scaled error. International Journal of Forecasting 32 (1), pp. 20–22. External Links: ISSN 0169-2070, Document Cited by: §4.1.
  • K. C. Green and J. S. Armstrong (2007) Structured analogies for forecasting. International Journal of Forecasting 23 (3), pp. 365–376. External Links: ISSN 0169-2070, Document Cited by: §2.3.
  • V. M. Guerrero (1993) Time-series analysis supported by power transformations. Journal of Forecasting 12 (1), pp. 37–48. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/for.3980120104 Cited by: §3.1.1.
  • M. Hibon and T. Evgeniou (2005) To combine or not to combine: selecting among forecasts and their combinations. International Journal of Forecasting 21, pp. 15–24. Cited by: §2.2.
  • R. Hyndman, G. Athanasopoulos, C. Bergmeir, G. Caceres, L. Chhay, M. O’Hara-Wild, F. Petropoulos, S. Razbash, E. Wang, and F. Yasmeen (2019) forecast: forecasting functions for time series and linear models. Note: R package version 8.7 External Links: Link Cited by: §3.1.1.
  • R. J. Hyndman, A. B. Koehler, R. D. Snyder, and S. Grose (2002) A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting 18 (3), pp. 439–454. External Links: Document, ISSN 0169-2070 Cited by: §2.1.
  • R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, and H. L. Shang (2011) Optimal combination forecasts for hierarchical time series. Computational Statistics & Data Analysis 55 (9), pp. 2579–2589. External Links: Document Cited by: §2.2.
  • R. J. Hyndman and Y. Khandakar (2008) Automatic time series forecasting: the forecast package for R. Journal of Statistical Software 27 (3), pp. 1–22. Cited by: §2.1, §4.3.
  • R. J. Hyndman and A. B. Koehler (2006) Another look at measures of forecast accuracy. International Journal of Forecasting 22, pp. 679–688. Cited by: §4.1.
  • V. R. R. Jose and R. L. Winkler (2008) Simple robust averages of forecasts: some empirical results. International Journal of Forecasting 24, pp. 163–169. Cited by: §1.
  • Y. Kang, R. J. Hyndman, and F. Li (2019) GRATIS: GeneRAting TIme series with diverse and controllable characteristics. arXiv 1903.02787. Cited by: §5, §5.
  • Y. Kang, R. J. Hyndman, and K. Smith-Miles (2017) Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting 33 (2), pp. 345–358. External Links: ISSN 0169-2070, Document, Link Cited by: §2.1, §5.
  • S. Kolassa (2011) Combining exponential smoothing forecasts using Akaike weights. International Journal of Forecasting 27 (2), pp. 238–251. External Links: ISSN 0169-2070 Cited by: §1, §2.2.
  • A. J. Koning, P. H. Franses, M. Hibon, and H. O. Stekler (2005) The M3 competition: statistical tests of the results. International Journal of Forecasting 21 (3), pp. 397–409. External Links: Document Cited by: §4.2.
  • N. Kourentzes, F. Petropoulos, and J. R. Trapero (2014) Improving forecasting by estimating time series structural components across multiple frequencies. International Journal of Forecasting 30 (2), pp. 291–302. External Links: ISSN 0169-2070 Cited by: §2.1, §2.2.
  • G. Kovacs and M. Moshtari (2019) A roadmap for higher research quality in humanitarian operations: a methodological perspective. European Journal of Operational Research 276 (2), pp. 395–408. External Links: ISSN 0377-2217, Document Cited by: §5.
  • K. C. Lichtendahl and R. L. Winkler (2019) Why do some combinations perform better than others?. International Journal of Forecasting In Press. External Links: ISSN 0169-2070, Document, Link Cited by: §4.3.
  • S. Makridakis and M. Hibon (2000) The M3-competition: results, conclusions and implications. International Journal of Forecasting 16 (4), pp. 451–476. External Links: ISSN 0169-2070 Cited by: §1, §4.1, §4.3.
  • S. Makridakis and R. L. Winkler (1983) Averages of forecasts: some empirical results. Management Science 29, pp. 987–996. Cited by: §2.2.
  • S. Makridakis, E. Spiliotis, and V. Assimakopoulos (2019) The M4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting In Press. External Links: ISSN 0169-2070, Document Cited by: §1, §2.3, §4.1, §4.1, §4.3, §5.
  • M. Mohammadipour and J. E. Boylan (2012) Forecast horizon aggregation in integer autoregressive moving average (INARMA) models. Omega 40 (6), pp. 703–712. External Links: ISSN 0305-0483 Cited by: §2.3.
  • P. Montero-Manso, G. Athanasopoulos, R. J. Hyndman, and T. S. Talagala (2019) FFORMA: feature-based forecast model averaging. Monash Econometrics and Business Statistics Working Papers 19. Cited by: §1, §2.3, §5.
  • K. I. Nikolopoulos, M. Z. Babai, and K. Bozos (2016) Forecasting supply chain sporadic demand with nearest neighbor approaches. International Journal of Production Economics 177, pp. 139–148. External Links: ISSN 0925-5273, Document Cited by: §2.3, §5.
  • K. Nikolopoulos, A. Litsa, F. Petropoulos, V. Bougioukos, and M. Khammash (2015) Relative performance of methods for forecasting special events. Journal of Business Research 68 (8), pp. 1785–1791. External Links: ISSN 0148-2963, Document Cited by: §2.3.
  • F. Petropoulos, R. J. Hyndman, and C. Bergmeir (2018) Exploring the sources of uncertainty: why does bagging for time series forecasting work?. European Journal of Operational Research 268 (2), pp. 545–554. External Links: ISSN 0377-2217, Document Cited by: §1, §2.1, §2.2.
  • F. Petropoulos, S. Makridakis, V. Assimakopoulos, and K. Nikolopoulos (2014) Horses for courses in demand forecasting. European Journal of Operational Research 237 (1), pp. 152–163. External Links: ISSN 0377-2217, Document, Link Cited by: §2.1, §5.
  • F. Petropoulos and I. Svetunkov (2019) A Simple Combination of Univariate Models. International Journal of Forecasting In Press (), pp. . Cited by: §2.2.
  • R. Ramanathan (2006) ABC inventory classification with multiple-criteria using weighted linear optimization. Computers & Operations Research 33 (3), pp. 695 – 700. External Links: ISSN 0305-0548, Document, Link Cited by: §5.
  • O. Rodríguez-Espíndola, P. Albores, and C. Brewster (2018) Disaster preparedness in humanitarian logistics: a collaborative approach for resource management in floods. European Journal of Operational Research 264 (3), pp. 978–993. External Links: ISSN 0377-2217, Document Cited by: §5.
  • B. Seaman (2018) Considerations of a retail forecasting practitioner. International Journal of Forecasting 34 (4), pp. 822 – 829. External Links: ISSN 0169-2070, Document, Link Cited by: §5.
  • C. Shah (1997) Model selection in univariate time series forecasting using discriminant analysis. International Journal of Forecasting 13 (4), pp. 489–500. External Links: ISSN 0169-2070, Document, Link Cited by: §2.1.
  • S. Smyl (2019)

    A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting

    .
    International Journal of Forecasting In Press. Cited by: §2.3, §3.1.3.
  • E. Spiliotis, V. Assimakopoulos, and K. Nikolopoulos (2019a) Forecasting with a hybrid method utilizing data smoothing, a variation of the theta method and shrinkage of seasonal factors. International Journal of Production Economics 209, pp. 92–102. External Links: ISSN 0925-5273, Document, Link Cited by: §3.1.2.
  • E. Spiliotis, A. Kouloumos, V. Assimakopoulos, and S. Makridakis (2019b) Are forecasting competitions data representative of the reality?. International Journal of Forecasting In Press. Cited by: §2.1, §5.
  • I. Svetunkov (2016) True model. Note: https://forecasting.svetunkov.ru/en/2016/06/25/true-model/Accessed: 2019-5-30 Cited by: §2.2.
  • A. A. Syntetos, K. Nikolopoulos, and J. E. Boylan (2010) Judging the judges through accuracy-implication metrics: the case of inventory forecasting. International Journal of Forecasting 26, pp. 134–143. Cited by: §5.
  • A. A. Syntetos, Z. Babai, J. E. Boylan, S. Kolassa, and K. Nikolopoulos (2016) Supply chain forecasting: theory, practice, their gap and the future. European Journal of Operational Research 252 (1), pp. 1–26. External Links: ISSN 0377-2217, Document Cited by: §5.
  • L. J. Tashman (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting 16 (4), pp. 437–450. External Links: Document, ISSN 0169-2070 Cited by: §2.1.
  • A. Timmermann (2006) Forecast combinations. In Handbook of Economic Forecasting, C.W.J. G. G. Elliott and A. Timmermann (Eds.), Vol. 1, pp. 135–196. External Links: Document Cited by: §2.2.
  • G. Willis, S. Cave, and M. Kunc (2018) Strategic workforce planning in healthcare: a multi-methodology approach. European Journal of Operational Research 267 (1), pp. 250–263. External Links: ISSN 0377-2217, Document Cited by: §5.
  • K. Zhang, H. Chen, J. Boylan, and P. Scarf (2013) Generalised estimators for seasonal forecasting by combining grouping with shrinkage approaches. Journal of Forecasting 32 (2), pp. 137–150. External Links: ISSN 0277-6693, Document Cited by: §2.3.