Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

02/03/2020 ∙ by Tine Van Calster, et al. ∙ 0

Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting exercise. This paper aims to facilitate this process for high-level tactical sales forecasts by comparing a large array of techniques for 35 times series that consist of both industry data from the Coca-Cola Company and publicly available datasets. However, instead of solely focusing on the accuracy of the resulting forecasts, this paper introduces a novel and completely automated profit-driven approach that takes into account the expected profit that a technique can create during both the model building and evaluation process. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise. Furthermore, we examine the added value of ML techniques, the inclusion of external factors and the use of seasonal models in order to ascertain which type of model works best in tactical sales forecasting. Our findings show that simple seasonal time series models consistently outperform other methodologies and that the profit-driven approach can lead to selecting a different forecasting model.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper focuses on one of the most frequently asked questions in forecasting theory and practice: which technique(s) should I choose to forecast this time series? In literature, this question has been posed many times and has indeed been answered by benchmarks and competitions (Armstrong and Fildes, 2006; Crone et al., 2011; Petropoulos et al., 2014), as forecasting has been an integral part of the business decision-making process for decades and is used for this purpose in many industries (Armstrong and Fildes, 2006; Cang and Yu, 2014; Lessmann and Voß, 2017). However, most studies only take one evaluation criterion into account, i.e. the performance of the techniques on a test set, while the final choice of a model in a business context depends on more considerations. Undoubtedly, the costs that are associated with inaccurate forecasts make sure that accuracy will always remain an important evaluation standard (Kahn, 2003). However, from a decision-making perspective, other questions immediately arise in the mind of the business expert as well, such as the potential impact of the forecast on the revenue of the company or the maintenance cost of the model. This paper therefore proposes an expected profit function that can be integrated into several steps of the forecasting process, while also taking a closer look at which types of models perform best in a sales forecast on a tactical or strategic level.

Recent publications have shown a large offering of forecasting techniques, ranging from the statistical methods to machine learning techniques. Given all of these theoretical and technological developments, it is becoming increasingly difficult to select the right type of technique for a given use case. Especially the group of Machine Learning (ML) techniques has received a lot of attention recently, as it constitutes one of the most popular topics in forecasting literature (Fildes, 2006). Most articles on ML techniques report favourable results when compared to more traditional methodologies, both for single use cases and more extensive comparisons (Crone et al., 2011), although publications generally have a tendency to only report on positive outcomes (Armstrong, 2006). However, several authors have expressed their reservations concerning these complex techniques (Makridakis and Hibon, 2000). In contrast, (Crone et al., 2011) have shown that machine learning has caught up with statistical modelling and should not be dismissed lightly for forecasting exercises. This paper therefore also aims to investigate whether these more complex ML techniques truly outperform the classical models for a tactical sales forecast.

In this paper, we will focus on the field of sales forecasting, as successful sales forecasts are vital in both short- and long-term strategic and financial planning (Ramos et al., 2015). This research specifically deals with high-level forecasts, which are primarily meant for decision-making purposes, as opposed to inventory planning for specific products. In practice, this typically entails a monthly time series, which is non-intermittent and is prone to display a trend, a seasonal pattern or a combination of these characteristics. This type of time series is common in other fields as well, and has therefore frequently been used for benchmarking purposes (Armstrong and Fildes, 2006; Crone et al., 2011; Petropoulos et al., 2014). We therefore take a look at the performance of techniques that model seasonality versus methodologies that do not have this ability, as season is a typical characteristic of sales time series. While trying out non-seasonal models might seems counterintuitive for this data, a lot of the more recently developed techniques do not have a seasonal component and still seem to perform very well for many applications. This paper therefore also investigates whether this type of model can perform well on these seasonal time series, given the necessary pre-processing of the data. Furthermore, this high-level data also raises the question of the usefulness of incorporating external factors into the forecast. While th addition of variables has obvious benefits, such as the explanatory value, it frequently leads to higher model maintainability costs. Thus, we also compare univariate techniques with and without the ability to add external drivers to one another in this paper.

Our contributions are twofold, as we aim to both benchmark a large set of forecasting techniques and integrate a practical construct into the model building and evaluating process, i.e. profit. Firstly, we propose a new strategy to inject a profit-oriented view into the entire forecasting process without explicitly forecasting profit itself. In practice, this constitutes a different way of performing feature selection, tuning hyper parameters and evaluating the forecasting techniques with the goal of achieving the models that yield the highest expected profit. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise

(Van Calster et al., 2017)

. Furthermore, our methodology ensures a completely automated and data-driven model building process. Secondly, we benchmark a large range of forecasting techniques according to three different categorizations. As mentioned above, we contrast a range of complex techniques and traditional techniques, in order to assess whether the ML techniques are truly able to perform equally in regards to tactical sales forecasts. Secondly, we take the seasonal characteristics of sales time series into consideration by distinguishing techniques that model seasonality themselves and methods that require seasonal dummy variables to achieve the same goal. Finally, we contrast techniques with and without variables, as we investigate the value of external factors in a high-level sales forecast. In terms of evaluating the techniques, we take accuracy, expected profit, model complexity and model interpretability into consideration in order to integrate the business aspect of forecasting into the benchmark. In the end, we aim to quantitatively select the techniques that forecast accurately, lead to the highest expected profit for any business case, and make the most sense from a business perspective. We will address these research questions by means of a total of 35 monthly sales datasets. The datasets were collected from both The Coca-Cola Company and from publicly available resources in order to add to the generalizability of the study.

The paper is organized as follows. Section 2 deals with the related work that provides a necessary background to the research questions. Section 3 describes the datasets, the forecasting techniques and the general methodology of the experiments. Next, section 4 focuses on the results of the research, while section 5 includes the conclusion.

2 Related work

This section on related work focuses on the necessary background literature for the research questions. We take a closer look at the forecasting literature on benchmarking, while also considering recent literature on profit-oriented analytics.

2.1 Benchmarking in forecasting

Forecasts are typically performed by three categories of techniques (Cang and Yu, 2014): traditional time series analysis (Aboagye-Sarfo et al., 2015; Akın, 2015; Arunraj and Ahrens, 2015; Athanasopoulos et al., 2011; Franses and Van Dijk, 2005; Gil-Alana et al., 2008; Gunter and Önder, 2015; Petropoulos et al., 2014; Ramos et al., 2015; Santos et al., 2012), causal regression techniques (Akın, 2015; Arunraj and Ahrens, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Lessmann et al., 2015; Ma et al., 2016; Nikolopoulos et al., 2007)

, and more complex artificial intelligence techniques

(Akın, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Crone et al., 2011; Fagiani et al., 2015; Lessmann et al., 2015; Taylor et al., 2006). The emergence of new techniques often requires a comparison with former methods, which leads to an extensive literature on benchmarking, both for individual use cases (Aboagye-Sarfo et al., 2015; Arunraj and Ahrens, 2015; Bozos and Nikolopoulos, 2011; Gil-Alana et al., 2008; Gunter and Önder, 2015; Lessmann et al., 2015) and for larger sets of time series (Athanasopoulos et al., 2011; Cang and Yu, 2014; Crone et al., 2011; Franses and Van Dijk, 2005; Ma et al., 2016; Makridakis and Hibon, 2000; Petropoulos et al., 2014; Weller and Crone, 2012). This research consists of both field-specific (Aboagye-Sarfo et al., 2015; Akın, 2015; Athanasopoulos et al., 2011; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Fagiani et al., 2015; Lessmann et al., 2015; Ma et al., 2016; Weller and Crone, 2012) and industry-neutral benchmarks, which are oriented towards general conclusions (Crone et al., 2011; Makridakis and Hibon, 2000; Petropoulos et al., 2014). While some studies use a combination of generated data and industry data (Petropoulos et al., 2014), most use real-life datasets to answer their research questions (Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Lessmann et al., 2015; Weller and Crone, 2012).

In terms of the conclusions that have come out of the larger studies, some discrepancies arise. While several studies point out that the newer ML techniques do not perform as well as the more traditional methods for classical time series (Makridakis and Hibon, 2000), others claim that these complex techniques have caught up in recent years (Crone et al., 2011). In this paper, we therefore take a look at a wider range of techniques from all three categories that were mentioned above. Furthermore, we also contrast techniques with and without external factors, which adds another factor that has not been part of many larger benchmarking studies, except for (Athanasopoulos et al., 2011). Our paper combines these elements in an extensive benchmark that is based on publicly available data and recent sales time series.

2.2 Profit-driven analytics

Profit-driven analytics has recently become a hot topic in analytics, as businesses are interested in the actual value that predictive models generate or the influence that they have on their eventual net profits. Integrating this value-centric view into analytics, has led to a growing number of profit-driven methodologies, techniques and metrics (Verbeke et al., 2017)

. These profit functions can be used in different steps of the model building and model selection process. For example, profit has been used as an evaluation metric for benchmarks in different fields

(Óskarsdóttir et al., 2017; Verbraken et al., 2013), while it has inspired entire profit-driven algorithms as well (Stripling et al., 2015; Verbeke et al., 2012). In this paper, we aim to integrate this profit-oriented view into multiple steps of the forecasting process instead of only using it as an additional evaluation criterion.

In forecasting, research on the profit aspect is scarcer than in other fields. While the monetary value of classification models has been extensively reviewed, the same cannot be said for regression models. However, the impact of forecasting accuracy on net profit is an interesting subject, as under- and over-forecasting both lead to completely different costs. The former might lead to a loss in sales and out-of-stock products, while the latter can lead to overstock and storage costs. While both directions for the error inevitably bring about a loss of profit, they are often not equal. Completely symmetric profit loss functions that are solely based on accuracy measures are therefore not representative of the real world. The ultimate goal of profit-oriented analytics is to find the model with the best balance between costs and accuracy. While these two concepts are inevitably linked in a forecasting exercise, we cannot state that they are exactly the same. Therefore, profit-oriented benchmarking should take into account both traditional accuracy metrics and metrics that point to the costs of the forecast, such as expected profit functions or model complexity. So far, two different views on the integration of profit into forecasting exercises have been proposed in recent literature. The first perspective optimizes an asymmetric loss function during model training to model the imbalance between over- and under-forecasting.

(Crone et al., 2005)

applies this methodology to neural networks, while

(Yang et al., 2002)

take a closer look at support vector regression models. The second way of integrating profit into a forecasting function takes place after the training process.

(Bansal et al., 2008) propose a tuning procedure that modifies the predictions so they are cost-optimal. (Zhao et al., 2011) further fine-tune this procedure. (Bozos and Nikolopoulos, 2011) also take a monetary value into account when evaluating their forecasts, but do not modify the models in any way.

In this paper, we take profit into account in all of the steps that are mentioned above. We optimize the parameters of our models, select features when necessary and evaluate our forecasts based on an asymmetric expected profit function that can easily be adjusted to any business case.

3 Methodology

This methodology section is divided into five parts. We begin by describing the datasets and by explaining the profit function that was used for both optimization and evaluation purposes in this paper. Next, the general experimental set-up is introduced, which also includes the description of the feature selection procedure. The fourth subsection is dedicated to an overview of the forecasting techniques, while the last subsection focuses on evaluation metrics.

3.1 Data

The data sets in this paper stem from two sources. Firstly, The Coca-Cola Company has given us a total of 20 time series, which represent two of their product categories in ten different countries. These monthly time series all range from January 2004 until September 2016. The external variables that correspond with these datasets, were collected by means of in-company data sources and are all based on information about the location of the data. Concretely, they consist of 20 variables that contain information on weather, macro-economic indicators, holidays and pricing information. As weather information, 4 variables were included, such as temperature and precipitation, while 9 variables allude to macro-economic information, such as GDP and CPI. Additionally, 3 factors refer to calendar effects of public holidays, while the final 4 variables relate to both in-company and competitor pricing. An overview of these variables can be found in Table 1. These external factors were selected according to data availability, but also take into account the literature on the interesting types of variables for sales forecasting. Several types of information have proven to be useful in this field, although this generally depends on the aggregation level of the time series (Syntetos et al., 2016) and the volatility of the time series (Currie and Rowley, 2010). Research has shown that factors such as weather (Bertrand et al., 2015), macro-economic influences (Sagaert et al., 2017) and pricing and promotional information (Huang et al., 2014; Ma et al., 2016) all have an impact on sales.

Variable name Explanation
Maximum temperature Average daily maximum temperature weighted by population
Maximum temperature squared Square of average daily maximum temperature weighted by population
Precipitation Average daily precipitation volume
Sunshine hours Average daily number of sunshine hours
Macro-economic indicators
Consumer Price Index Seasonally adjusted percentage change of CPI with regards to the previous month
Unemployment rate Percentage of unemployment for entire population
Exchange rate Exchange rate with US dollar
Short-term interest rate Short-term interest rate in percentage per annum
Industrial production Seasonally adjusted percentage change of industrial production with regards to the previous month
Merchandise import Seasonally adjusted percentage change of Merchandise import with regards to the previous month
Merchandise export Seasonally adjusted percentage change of Merchandise export with regards to the previous month
Gross Domestic Product Seasonally adjusted annual rate, percentage change of GDP with regards to the previous month
Private Consumption Seasonally adjusted annual rate, percentage change of PC with regards to the previous month
Public holiday Number of public holidays per month
Weekend Number of public holidays in the weekend per month (possibility of long weekend)
Tuesday/Thursday Number of public holidays on Tuesday or Thursday per month (possibility of long weekend)
Company price Average product category price in US dollars
Company price deflated Average product category price in US dollars deflated by CPI
Competitor price Average product category price of the main competitor in US dollars
Competitor price deflated Average product category price of the main competitor in US dollars deflated by CPI
Table 1: Summary of external variables

Secondly, we include a total of 15 publicly available datasets with similar characteristics in the analyses, in order to increase the generalizability of our findings, which can mostly be found in The Time Series Data Library111,222 The general features of these monthly time series are summarized in Table 2. As all of these datasets also include information on location, we collected twelve external variables that contain information on weather, macro-economic indicators and holidays as well. Concretely, we include four weather variables, seven macro-economic indicators and one holiday variable. The weather variables consist of the same features as defined in Table 1, while the macro-economic information includes all features in Table 1 except Merchandise Import and Merchandise Export. Finally, the models with external factors also contain the number of public holidays for each month. Pricing information was not available for these datasets. The sources for these three categories are publicly available333,444,555

Number of
product categories
Number of
data points
Beer 1 January 1956 -– August 1995 476 Australia
Car sales 1 1 January 1996 -– December 2008 156 California
Car sales 2 1 January 1960 -– December 1968 108 Canada
Champagne 1 January 1964 -– September 1972 105 France
Paper 1 January 1963 -– December 1972 120 France
Petrol 4 January 1971 -– December 1991 252 USA
Wine 6 January 1980 -– July 1995 187 Australia
Table 2: Public data summary

3.2 Expected profit function

The evaluation of any predictive model is generally focused on the accuracy that it achieves on a test set. In this paper, however, we take both accuracy and a more business-oriented profit measure into account. The profit measure is represented by Equation 2, which is dependent on our definition of the Percentage Error (PE), which can be found in Equation 1. This profit measure was first defined in (Van Calster et al., 2017)

and represents an estimation of the expected profit of the target variable. The formula is very easy to interpret and can easily be adjusted to any business use case. The two fundamental components of the expected profit measure are the volume of the sales, as more sales lead to more profit, and the accuracy of the forecast, as bad forecasts inevitably lead to a loss of profit. Next to these two core elements, we introduce several parameters that integrate expert knowledge into the profit function.




Firstly, the business user can influence the impact of the forecasting error on the expected profit by setting two parameters. The first one deals with how the size of the error is used as a penalization, as both over- and under-forecasting have proven to lead to various costs (Kahn, 2003). This penalization factor can be modified according to a specific circumstance in a data-driven manner by executing a sensitivity analysis on a validation set. In this instance, α is set at , which was determined in (Van Calster et al., 2017). Furthermore, the business expert can set penalization boundaries and , which indicate that any forecast that has a PE within these boundaries, does not lead to a significant impact on the final profit. Note that should always be larger than . For example, we set boundaries of error in both directions for The Coca-Cola Company use case ( and ). This leads to a preference for models with a small over-forecast, as the expected profit is higher because of a larger volume of sales in that case. This tendency is deemed appropriate for the use case because the company considers this an investment in the future. However, the and parameters can also be set unequally, if the forecasting error has a larger impact on profit in one particular direction, or even be completely omitted, if every inaccuracy while forecasting leads to a loss of profit.

Secondly, the weight refers to the profit margin for the product or product category at hand. This weight can be expressed both relatively between different products and in absolute numbers, such as currencies. For The Coca-Cola Company use case, these weights were determined by the profit that the product actually generated in the last year of the original training set. It is important to note that these weights remain constant throughout the analyses once they are set by the training set of the first prediction. The actual profit of a product will fluctuate over time and is driven by many external factors that are not captured in the function. We have chosen to keep this parameter constant because of two reasons: ease of use and availability of profit data. While the first reason is self-explanatory, the second one is tied to the particular use case of this paper. If data about the actual profit of a product is more readily available, this parameter can be used dynamically by updating it during the testing process. The profit in the analyses of this paper can therefore be viewed as the profit that the product will generate if business stays the same and must truly be interpreted as the expected profit. The weights for the publicly available datasets were chosen randomly with values between 0 and 3, and are displayed in Table 3.

Name weights
Beer 0.1
Car sales 1 2.1
Car sales 2 0.6
Champagne 1.0
Paper 1.9
Petrol 0.1, 1.2, 0.1, 2.8
Wine 2.2, 2.2, 2.1, 1.5, 0.4, 2.7
Table 3: weights of public datasets

3.3 Experimental set-up

The general experimental set-up consists of hold-out sample forecasts for all datasets. Concretely, the time series are split up into training, validation and test sets. The test set includes the final two years of the data, which leads to 24 data points to forecast. The validation set then consists of the year before the date that will be forecast and is only used for feature selection and parameter tuning when necessary for the given technique. Parameter tuning is performed once on the first validation set in the testing procedure, in order to avoid computational issues in the testing procedure. However, the feature selection procedure is repeated every three months, in order to keep the model up-to-date. Once the necessary variables and hyper parameters have been selected, the training and validation sets are merged together in order to forecast the test set. Both the training and validation sets change with every forecast, as the set-up consists of an expanding window. In the end, we therefore collect 24 one-month ahead forecasts for each technique and for each dataset. The complete experimental set-up is visualized in Figure 1.

Figure 1: Experimental set-up

The feature selection procedure consists of a hybrid method, which is based on the combination of Minimum Redundancy Maximum Relevance criterion (mRMR) that was created by (Peng et al., 2005) as a filtering technique, and a simple incremental wrapper method. The mRMR method is a mutual-information based algorithm that ranks the external factors according to their shared information with the target variable, while also taking into account their dependency on the other external factors. This can be achieved by finding the feature set with features that maximizes the relevance with the target class and minimizes the dependency between the independent variables. In short, this filter finds the features that maximize Equation 3, which combines Equation 4 (Dependency) and Equation 5 (Redundancy).


In this paper, the first feature step in this paper selects either the top 15 or the top 10 ranking of features, for the Coca-Cola Company datasets and the public datasets respectively, and then passes this on to the next step. Next, a simple forward incremental wrapper method starts with the top feature of the ranking and forecasts the validation set. Consecutively, one feature is added at a time into the feature set until the entire top 15 or top 10 ranking is used in the forecasting model. This methodology therefore takes advantage of the initial ranking that was made by the mRMR filter. The feature set that will be used to forecast the test set, is selected out of these 15 or 10 options by maximizing the profit function, which is defined in Section 3.2. This entire procedure is explained in Algorithm 1. In our benchmark, is either 15 or 10, depending on the dataset at hand, and is equal to 12 months.

choose size of validation set
split time series into training set and validation set
choose initial number of features
rank features according to the mRMR Maximum Relevance criterion into ranking
for  to  do
     select top features from
     for  to  do
         train model with features on training set
         calculate profit
         add to training set      
     end for
     calculate profit by summing over all
     reset training set and validation set to original split
end for
Select features with highest profit
Algorithm 1 Pseudo code for feature selection procedure

Feature selection is generally important because of two entirely different reasons. Firstly, some of the variables might be correlated or influenced by the same underlying information, which can lead to less accurate forecasts (Boivin and Ng, 2006). A feature selection procedure is therefore used to determine which set of variables has the highest predictive power, while also eliminating any possible multicollinearity. Secondly, feature selection is equally important from a business perspective, as transparent models also have an explanatory advantage. Business analysts are interested in gaining knowledge on which external factors might influence their target variable, which can be useful for strategic decisions (Athanasopoulos et al., 2011). However, this knowledge then also relates to the maintenance of the model, as the variables that never survive the feature selection procedure during testing, are not needed any longer.

3.4 Forecasting techniques

In order to conduct the necessary experiments, a total of 17 forecasting techniques were selected, which are summarized in Table 4. These techniques are categorized according to three different types of attributes in order to answer our research questions. Firstly, we organize the methods according to the ability to use them as univariate with and/or without external drivers. We define a univariate technique without variables as a technique that only makes use of the sales times series itself to predict the next month. Techniques that are able to include variables, however, also integrate the external drivers, such as the weather, to generate a prediction. 8 techniques can be used in both ways, such as regression models, when past sales values are encoded as independent variables, next to the aforementioned external factors. We therefore benchmark a total of 26 techniques in our final analysis. Secondly, Table 4

displays the ability of a technique to explicitly model the seasonality of a time series, as seasonality is a typical characteristic of the sales time series that we are considering in this paper. Thirdly, the forecasting techniques are classified into Machine Learning (ML) techniques and non-ML techniques. Recently, a lot of forecasting literature has focused on these ML techniques and often reports them to be more accurate than traditional techniques. In order to simplify the issue of what is considered an ML technique and what is not, we chose to consider methods ML if they belong to one of the four following categories: decision tree learning, neural networks, support vector machines and k-nearest neighbours, as this last category is based on a clustering algorithm. These three categorizations will underpin the answer to which type of technique is best used to achieve an accurate sales forecast. Finally, the table also contains the hyper parameters that were selected beforehand, and their possible values. Tuning hyper parameters has proven to be essential for truly assessing how well a certain technique can perform

(Carrizosa et al., 2014), and is therefore an essential part of benchmarking in general. In this paper, the parameter selection was conducted by evaluating model performance on the validation set and by applying an exhaustive grid search methodology. The evaluation metric that was optimized, is again the expected profit function that was defined in Section 3.2. Note that only the parameters that are mentioned in 4 are set in this way.

Model Variables? Seasonal? ML? Hyper parameters Possible values
Holt-Winters exponential
No Yes No / /
Seasonal ARIMA No Yes No
AR, MA, SAR and
SMA terms
[0, 5]
Seasonal decomposition by
Loess model
No Yes No / /
Seasonal random walk No Yes No / /
ARMA-GARCH No No No AR and MA terms [0, 5]
Random walk No No No / /
Seasonal ARIMAX Yes Yes No
AR, MA, SAR and
SMA terms
[0, 5]
Vector Autoregression Yes No No AR term [0, 5]
Conditional Inference
Regression Tree
Both No Yes / /

Multiple Linear Regression

Both No No / /
Multivariate Adaptive
Regression Splines
Both No No
Maximum degree of
[1, 2]
Recursive Partitioning
Regression Tree
Both No Yes / /
K Nearest Neighbors Regression Both No Yes
Number of neighbors
Weights for neighboring
response values
[2, 5]
uniform, by distance
Long Short Term Memory RNN Both No Yes

Number of hidden neurons

[1, 10]
Random Forests Both No Yes / /

Simple Multilayer Perceptron

Both No Yes Number of hidden neurons [1, 10]
Support Vector Regression Both No Yes
Penalty parameter
of error term
Gamma (for rbf kernel
Radial basis function,
1e0, 1e1, 1e2,1e3
[1e-2, 1e2]
Table 4: Overview of forecasting techniques

It is important to comment on the influence of the type of technique on the data preprocessing aspect of the analyses. Firstly, we normalized all variables to a range between 0 and 1 for all of the analyses in this paper. This step was especially necessary for techniques such as neural networks, as literature reports this as a general practice because they benefit greatly from this step (Sola and Sevilla, 1997). Furthermore, business users can derive insights on the relative importance of variables if the forecasting technique is transparent, in order to identify the most important drivers of their sales. Secondly, the time series that are part of the analyses all display a certain trend and seasonality, which should be incorporated into the forecasting model if possible. The time series analysis techniques that we consider in this paper, explicitly include this seasonality in their model building by, for example, defining seasonal parameters. However, other types of techniques, such as regression models or neural networks, do not have this ability, which can lead to worse forecasts if the trend and season have a strong influence on the sales (Zhang and Qi, 2005). We therefore add two additional data preprocessing steps for this type of models: trend/seasonal differencing and seasonal dummy variables. In the first step, we check whether the time series actually contains either a trend or a season by means of appropriate unit root tests, such i.e. the Augmented Dickey-Fuller test (Dickey and Fuller, 1979) and the Osborn-Chui-Smith-Birchenhall test respectively (Osborn et al., 1988). If the results thereof show signs of either characteristic, we apply the corresponding differencing. Secondly, if the time series is seasonal, we also include a set of seasonal dummy variables to further model the possible seasonal effects. These variables are not included in the feature selection procedure, but are always included if there is a seasonal component in the time series. Thirdly, when techniques can be used both with and without variables, past sales values need to be encoded as independent variables. We therefore need to determine how many past values will be included into the model. This hyper parameter is selected on the same validation set as the other hyper parameters, and has possible values ranging from one month to seven months. Furthermore, we define methods with external factors as techniques that use both past sales data and external parameters as independent variables. In this case, the number of past months to use as input to the model, is therefore again a hyper parameter.

Finally, we note that the list of forecasting techniques is not exhaustive. Two types of methods are notably under-represented: ensemble methodologies and deep learning methods. We opted to include only one technique of each category in order to keep the scope of the paper manageable, i.e. Random Forests and Long-Short Term Memory Neural Networks respectively. However, the obvious next step of this research is to take a closer look at these types of methodologies.

3.5 Evaluation

Evaluation for forecasting benchmarks is often entirely based on accuracy metrics. There has been a lot of discussion in the past about which metric gives the best overview of performance when comparing techniques, as many commonly used measures can exhibit strange behavior (Hyndman and Koehler, 2006; Kolassa, 2016; Tashman, 2000). In this paper, we therefore propose a combination of frequently used accuracy metrics and the expected profit function that was defined above, to select the best-performing models. In the first category, we take into account the Mean Absolute Percentage Error (MAPE) and the Root Mean Squared Error (RMSE), as defined in Equations 6 and 7. Furthermore, we include the seasonal version of the Mean Absolute Scaled Error, which was first defined in (Hyndman and Koehler, 2006), based on the seasonality of the time series data. The formula for this metric can be found in Equation 8 with as the seasonality of the time series. This metric compares a technique’s performance to the in-sample error of a seasonal naïve model, which makes it perfect for truly benchmarking techniques. Next to the expected profit function, we also consider the computation time of each forecast as an approximate of the model complexity. We therefore include a total of five quantitative performance metrics in our analysis.


4 Results

The result section of this paper will firstly take a look at the experimental results, which are based on forecasting the 35 datasets with 17 different forecasting techniques. Secondly, we will discuss the implications of these results, while we also comment on the limitations of this study.

4.1 Experimental results

Model MAPE RMSE MASE Profit Time
Without external factors
ARMA-GARCH (GARCH) 12.16 (0.00) 12.14 (0.00) 12.16 (0.00) 12.58 (0.00) 19.68 (0.00)
Conditional Inference Regression Tree (CtreeUni) 12.93 (0.00) 12.92 (0.00) 12.88 (0.00) 12.79 (0.00) 8.16 (0.00)
Holt-Winters exponential smoothing (HW) 10.52 (1.00) 10.52 (1.00) 10.54 (1.00) 10.74 (0.00) 5.61 (0.00)
K Nearest Neighbors Regression (KNNUni) 13.10 (0.00) 13.10 (0.00) 13.12 (0.00) 12.95 (0.00) 17.35 (0.00)
Long Short Term Memory RNN (LSTMUni) 17.23 (0.00) 17.26 (0.00) 17.27 (0.00) 16.78 (0.00) 25.05 (0.00)
Multiple Linear Regression (LRUni) 13.49 (0.00) 13.49 (0.00) 13.46 (0.00) 13.93 (0.00) 3.50 (0.61)
Multivariate Adaptive Regression Splines (MARSUni) 13.48 (0.00) 13.48 (0.00) 13.46 (0.00) 13.70 (0.00) 6.68 (0.00)
Random Forests (RFUni) 14.07 (0.00) 14.07 (0.00) 14.05 (0.00) 13.91 (0.00) 12.16 (0.00)
Random walk (RW) 18.43 (0.00) 18.41 (0.00) 18.42 (0.00) 18.11 (0.00) 2.52 (/)
Recursive Partitioning Regression Tree (RpartUni) 13.58 (0.00) 13.59 (0.00) 13.57 (0.00) 13.45 (0.00) 6.15 (0.00)
Seasonal ARIMA (SARIMA) 10.45 (/) 10.47 (/) 10.48 (/) 10.70 (/) 18.40 (0.00)
Seasonal decomposition by Loess model (DM) 11.80 (0.07) 11.80 (0.08) 11.81 (0.07) 12.12 (0.03) 9.67 (0.00)
Seasonal random walk (SRW) 12.40 (0.00) 12.43 (0.00) 12.45 (0.00) 13.14 (0.00) 2.62 (1.00)
Simple Multilayer Perceptron (MLPUni) 12.52 (0.00) 12.61 (0.00) 12.61 (0.00) 12.59 (0.00) 21.95 (0.00)
Support Vector Regression (SVRUni) 12.29 (0.00) 12.29 (0.00) 12.30 (0.00) 12.45 (0.00) 17.55 (0.00)
With external factors
Conditional Inference Regression Tree (CtreeMulti) 12.93 (0.00) 12.93 (0.00) 12.88 (0.00) 12.81 (0.00) 9.76 (0.00)
K Nearest Neighbors Regression (KNNMulti) 15.07 (0.00) 15.07 (0.00) 15.10 (0.00) 14.90 (0.00) 18.53 (0.00)
Long Short Term Memory RNN (LSTMMulti) 17.05 (0.00) 16.93 (0.00) 16.95 (0.00) 16.78 (0.00) 25.02 (0.00)
Multiple Linear Regression (LRMulti) 12.89 (0.00) 12.88 (0.00) 12.86 (0.00) 12.77 (0.00) 6.05 (0.00)
Multivariate Adaptive Regression Splines (MARSMulti) 13.25 (0.00) 13.25 (0.00) 13.24 (0.00) 13.22 (0.00) 11.87 (0.00)
Random Forests (RFMulti) 14.14 (0.00) 14.12 (0.00) 14.10 (0.00) 13.89 (0.00) 15.61 (0.00)
Recursive Partitioning Regression Tree (RpartMulti) 14.28 (0.00) 14.31 (0.00) 14.29 (0.00) 13.98 (0.00) 8.70 (0.00)
Seasonal ARIMAX (SARIMAX) 10.81 (1.00) 10.80 (1.00) 10.82 (1.00) 11.16 (0.00) 20.78 (0.00)
Simple Multilayer Perceptron (MLPMulti) 14.13 (0.00) 14.11 (0.00) 14.12 (0.00) 13.45 (0.00) 24.20 (0.00)
Support Vector Regression (SVRMulti) 14.04 (0.00) 14.01 (0.00) 14.04 (0.00) 14.06 (0.00) 19.13 (0.00)
Vector Autoregression (VAR) 13.97 (0.00) 13.99 (0.00) 14.00 (0.00) 14.05 (0.00) 14.31 (0.00)
Friedman test
Chi-Squared 1299.8 1285.8 1286.7 1059.3 18708
P-value <2.2e-16 <2.2e-16 <2.2e-16 <2.2e-16 <2.2e-16
Table 5: Overview of benchmarking results. Columns contain the forecasting techniques and their average ranks according to MAPE, RMSE, MASE, expected profit and computation time. The numbers between brackets are the p-values from the pairwise Nemenyi test that compares the given method to the best technique according to the evaluation metric at hand.

The results of the experiments are based on a total of 21840 forecasts, as we performed 24 one-month-ahead forecasts on 35 time series with 26 different models. We only take into account the results for models that have completed both the parameter tuning and feature selection procedures that were explained in the methodology section of this paper, see Section 3. Other model set-ups were disregarded for the final analyses.

In order to compare all of these models to one another, we apply two ranking tests for the 26 forecasting techniques according to five evaluation measures: MAPE, RMSE, MASE, Profit and computation time. Concretely, we rank all of the methods for each of the 840 unique forecasts and then display the average over these forecasts. This methodology ensures a fairer comparison between the techniques than, e.g., simply taking an average MAPE of the 840 forecasts. Furthermore, we can verify if the differences in rank are significantly separate from one another. The Friedman test (Friedman, 1940) is a non-parametric statistical test that verifies whether the difference between two treatments is significant or not. In this benchmark, the 26 forecasting techniques constitute the ’treatments’, defined as in Equation 9, while the 35 time series datasets are the ’blocks’, in Equation 9, which form groups of similar units. The Friedman test will rank the treatments according to a given evaluation criterion and will compare this ranking for each block. Therefore, three different average rankings are made for these experiments, according to the three evaluation measures. The p-value of the Friedman test then indicates if there exists a significant difference between any of the treatments.


If this test is significant, a post-hoc analysis must follow, as we are interested to know which techniques differ from one another. As all of the Friedman tests are indeed significant, we turn to a second step, which consists of a pairwise Nemenyi test (Nemenyi, 1962) for the three rankings. Concretely, the test determines if the average ranks of the models are at least at a critical distance of:


with as critical values, which consist of the Studentized range statistic divided by .

The results of these tests can be found in Table 5. The first column contains the method’s name, together with its abbreviation in all following figures. This table then displays the average rank of all of the forecasting methods according to MAPE, RMSE, MASE, expected profit and computation time. The numbers between brackets are the p-values from the pairwise Nemenyi test that compares the given method to the best technique according to the evaluation metric at hand, which is indicated in bold for each evaluation measure. All p-values with a significance level are underlined. Figure 2 displays the relative rankings according to each evaluation metric from best at the top to worst on the bottom. The grey boxes contain the models that are not significantly different from the best model at a significance level. The lines connect the relative rankings of the same technique according to the different measures.

Figure 2: Rank comparison

We can now turn back to our research questions, which are firstly focused on selecting the best type of technique for a tactical sales forecast. In our introduction, we posed three sub research questions that each require the techniques to be divided into two categories. Firstly, the techniques can be divided into techniques with and without external features. Furthermore, we contrast seasonal and non-seasonal methods, and distinguish between machine learning and statistical techniques. In order to ascertain whether one category significantly outperforms the other, we perform a Wilcoxon signed rank test on the average ranks of each category for each forecast. The null hypothesis of this test states that there is no difference between the two groups. For each of the categorizations, this test rejected the null hypothesis and proved to be significant at the

significance level. Firstly, univariate techniques without variables outperform the ones that add external factors with a p-value smaller than 2.2e-16. Seasonal models have a significantly lower average rank, with a p-value that is also smaller than 2.2e-16. Finally, the ML techniques consistently lead to worse results than the statistical models, as the Wilcoxon test was highly significant with a p-value of 0.00235. In short, time series models that explicitly model seasonality and do not incorporate external factors still seem to be the best candidate for our set of tactical sales forecasts, which all display a trend and seasonality. In order to select the top performing techniques, we take Table 5 and Figure 3 into consideration again, which show a clear top four models in terms of both accuracy and profit. The best-performing forecasting techniques are SARIMA, SARIMAX, Holt-Winters and the seasonal decomposition model (DM). The only exception to the seasonal, univariate and non-ML rule is SARIMAX, which also incorporates external drivers into the model.

In Figures 3, we take a closer look at these best-performing models in terms of both accuracy and the expected profit. These figures contain the distributions of the pairwise differences of SARIMA, SARIMAX, Holt-Winters and the seasonal decomposition model (DM). Grey boxplots indicate a significant difference between the two models that are mentioned on the Y-axis. DM seems to consistently perform worse than the other three models, while the remaining three time series models all perform equally in terms of both MAPE and expected profit.

Figure 3: Pairwise differences of four best-performing models

The last performance metric that can still make a difference in the selection of the best-performing technique, is computation time. This measure is indicative of the complexity of the model, but can also have an effect on the final costs of the model as computation efforts also lead to additional expenses. The average computation time of the top four models is summarized in Table 6 below. Clearly, the training of Holt-Winters and DM take the least amount of time by far. However, the average computation times of both SARIMA and SARIMAX are still below 10 seconds per forecast. Furthermore, it is logical that these last two techniques require more time to train, given the feature and hyper parameter optimization according to profit for both of them. In conclusion, a top three of equally performing time series models remains: Holt-Winters exponential smoothing, Seasonal ARIMA and Seasonal ARIMAX, but Holt-Winters will significantly save on computation time if there is a large number of time series to forecast.

Model Average computation time (seconds)
Holt-Winters 0.02
Seasonal decomposition model 0.05
Table 6: Average computation time for best-performing models

Finally, we also take a closer look at the interpretability of these top three techniques. As time series models, they are all transparent methodologies that attribute weights to the autoregressive, trend and seasonal components of the time series. Additionally, SARIMAX displays the weights of the added external factors, indicating their impact on the sales, which greatly adds to the explanatory power of the model. This therefore entails a large advantage for the SARIMAX technique in terms of business value. On the other hand, the feature selection procedure leads to a higher computation time and effort, so these two aspects need to be weighed against one another. In the end, the univariate time series models perform equally to SARIMAX, but additional information on the external influences on the sales might be preferable in a business context. Note that this refers to two completely different objectives, i.e. predicting versus explaining. Before the final selection of the best technique, businesses need to clearly outline the objective of a forecasting model. In terms of variable selection in this paper, Figure 4 shows the average percentage of selected variables for each of the variable types, illustrated for each of the two data sources. From these charts, we can conclude that weather and macro-economic variables are selected the most for all datasets. On average, 2 weather variables and 2.5 macro-economic variables were selected for the Coca-Cola Company datasets, while 1.78 weather variables and 3.89 macro-economic variables were chosen for the public datasets.

Figure 4: Average percentage of selected variables

The second research question focused on the integration of the expected profit function into the model selection process. We can clearly see from Table 5 and especially Figure 2 that the ranking of the techniques according to MAPE, RMSE and MASE are virtually the same, while the ranking according to the expected profit function looks a bit different. Although the top methods perform well according to all of these evaluation measures, the changes in the ranking already indicate that it is valuable to compare models according to profit as well, as it might lead to a different ranking of the possible techniques. For example, the p-values in Table 5 of the DM technique are not significantly different from the top three time series models in terms of the accuracy measures, but they are significantly different from them when we look at the expected profit function. In Figure 5, we will look at some pairwise differences of other models according to MAPE and Profit as well. In this figure, we can clearly see that techniques can significantly differ in terms of Profit and not in terms of MAPE, or vice versa. Specifically, we compare the univariate cases of Multiple Linear Regression (LRUni) and Support Vector Regression (SVRUni), and the variant with external factors of Simple Multilayer Perceptron (MLPMulti). The pairwise differences between SVR and LR, and SVR and MLP show that there is a clear difference between the two evaluation measures. It is also important to note that these changes do not exist in pairwise differences when we only compare the accuracy metrics.

Figure 5: Domination plots of pairwise Nemenyi differences in p-values

4.2 Discussion

The results of this study have three interesting implications for model selection in sales forecasting from a business perspective. Firstly, we proposed a profit-driven approach that provides a completely automated framework for model building and selection. The expected profit function that we implement, is completely adaptable to any sales forecasting situation by combining business expertise with traditional accuracy-based evaluation. Furthermore, this profit function can be used as an evaluation criterion that gives a different view on which technique is truly the best one in a benchmarking exercise. While the results in this paper are consistently in line with the accuracy measures, the overall ranking according to profit is still significantly different than the accuracy-based ones. This indicates that a ranking according to profit might yield a different result in model selection. In this paper, however, the top three models’ performance was consistenly very close to one another, while the same models outperformed others by a significant margin. In other cases, when model performance between techniques is closer to one another, the expected profit function can provide an additional perspective into final model selection. Furthermore, this paper adds to the scarce literature on the use of profit-driven analytics in forecasting and regression analysis.

Secondly, we notice that univariate time series models that explicitly capture seasonality, perform the best in this benchmarking study, although the Seasonal ARIMAX method is an exception to the univariate characteristic. However, this technique only performs equally to the aforementioned univariate methods and we can therefore raise the question if the addition of external variables is truly useful in this context. While other studies have shown the value of adding external drivers into the models for sales forecasting on a strategic level (Sagaert et al., 2017), this research shows that we can forecast as well without any independent variables in the model. When we take into account the additional cost of data collection and model maintenance, we conclude that forecasting the sales on a product category level is easier achieved by univariate models without compromising on accuracy or profit. Although we recognize the added explanatory value of integrating features, we question if it is worth the effort when achieving the best forecast is the goal.

Finally, we compared two categories of forecasting techniques to one another: statistical methods and machine learning techniques. In the case of tactical sales forecasting, we clearly see that simpler models outperform the others significantly for these 35 datasets. This leads us to conclude that the more traditional models are actually still performing the best when tackling this type of time series problem. These findings are in line with (Makridakis and Hibon, 2000), but contradict (Crone et al., 2011). To conclude, seasonal time series models tend to outperform other techniques for a tactical sales forecast. From a business perspective, this conclusion is especially positive, as these models are easy to interpret and have a faster computation time.

5 Conclusion

In this paper, we introduced a new, completely automated and profit-oriented strategy to sales forecasting, which integrates an expected profit function into several steps of model selection. This function can be implemented in any sales forecasting context by letting business experts and previous data set the profit margins for every product. Furthermore, our research has proven that simpler time series models tend to outperform more complex techniques for 35 sales datasets. All of the applied ML techniques achieve significantly worse results than the traditional models, both in accuracy and profit. This implies that less complex techniques are still the best type of method to handle tactical sales forecasting. Finally, we found that univariate time series models that are able to explicitly model the seasonality of a time series, perform best. This indicates that the addition of external variables is unnecessary, especially when we consider the additional costs that are linked to maintaining models with external drivers.

In terms of possible limitations of this study, we recognize some shortcomings in this paper. Firstly, it is impossible to come up with an exhaustive list of forecasting techniques. However, we have attempted to implement common methods from all three categories of techniques that are frequently used for forecasting. Furthermore, this research consists of 35 monthly time series, which is significantly less than the larger benchmarks and competitions in the field (Athanasopoulos et al., 2011; Crone et al., 2011; Makridakis and Hibon, 2000). However, this paper particularly focuses on one field, i.e. sales forecasting, and is one of the larger benchmarking studies in this specific area. Furthermore, we have added to the generalizability and reproducibility of the study by including several publicly available datasets as well. Finally, we only implemented individual forecasting methods without considering ensemble methods. This type of methodology has become extremely popular in forecasting (Lessmann et al., 2015) and it has been proven that this approach can significantly impact the accuracy of forecasts. Potential future research therefore includes an expansion of this study in three aspects. Firstly, we can include more sales time series in order to further underpin our statements. Secondly, we can implement more techniques and include ensemble methods. Finally, this study can be further expanded to other fields than sales forecasting as well. However, given the range of techniques and the number of datasets that were already used in this paper, we can state that simple, seasonal time series models are still the best choice for a high-level tactical sales forecast.


We would like to acknowledge The Coca-Cola Company for funding this research and providing us with the necessary business expertise and data to conduct our experiments.



  • P. Aboagye-Sarfo, Q. Mai, F. M. Sanfilippo, D. B. Preen, L. M. Stewart, and D. M. Fatovich (2015) A comparison of multivariate and univariate time series approaches to modelling and forecasting emergency department demand in western australia. Journal of biomedical informatics 57, pp. 62–73. Cited by: §2.1.
  • M. Akın (2015) A novel approach to model selection in tourism demand modeling. Tourism Management 48, pp. 64–72. Cited by: §2.1.
  • J. S. Armstrong and R. Fildes (2006) Making progress in forecasting. International Journal of Forecasting 22 (3), pp. 433–441. Cited by: §1, §1.
  • J. S. Armstrong (2006) Findings from evidence-based forecasting: methods for reducing forecast error. International Journal of Forecasting 22 (3), pp. 583–598. Cited by: §1.
  • N. S. Arunraj and D. Ahrens (2015)

    A hybrid seasonal autoregressive integrated moving average and quantile regression for daily food sales forecasting

    International Journal of Production Economics 170, pp. 321–335. Cited by: §2.1.
  • G. Athanasopoulos, R. J. Hyndman, H. Song, and D. C. Wu (2011) The tourism forecasting competition. International Journal of Forecasting 27 (3), pp. 822–844. Cited by: §2.1, §2.1, §3.3, §5.
  • G. Bansal, A. P. Sinha, and H. Zhao (2008) Tuning data mining methods for cost-sensitive regression: a study in loan charge-off forecasting. Journal of Management Information Systems 25 (3), pp. 315–336. Cited by: §2.2.
  • J. Bertrand, X. Brusset, and M. Fortin (2015) Assessing and hedging the cost of unseasonal weather: case of the apparel sector. European Journal of Operational Research 244 (1), pp. 261–276. Cited by: §3.1.
  • J. Boivin and S. Ng (2006) Are more data always better for factor analysis?. Journal of Econometrics 132 (1), pp. 169–194. Cited by: §3.3.
  • K. Bozos and K. Nikolopoulos (2011) Forecasting the value effect of seasoned equity offering announcements. European Journal of Operational Research 214 (2), pp. 418–427. Cited by: §2.1, §2.2.
  • S. Cang and H. Yu (2014) A combination selection algorithm on forecasting. European Journal of Operational Research 234 (1), pp. 127–139. Cited by: §1, §2.1.
  • E. Carrizosa, B. Martín-Barragán, and D. R. Morales (2014)

    A nested heuristic for parameter tuning in support vector machines

    Computers & Operations Research 43, pp. 328–334. Cited by: §3.4.
  • S. F. Crone, M. Hibon, and K. Nikolopoulos (2011) Advances in forecasting with neural networks? empirical evidence from the nn3 competition on time series prediction. International Journal of Forecasting 27 (3), pp. 635–660. Cited by: §1, §1, §1, §2.1, §2.1, §4.2, §5.
  • S. F. Crone, S. Lessmann, and R. Stahlbock (2005) Utility based data mining for time series analysis: cost-sensitive learning for neural network predictors. In Proceedings of the 1st international workshop on Utility-based data mining, pp. 59–68. Cited by: §2.2.
  • C. S. Currie and I. T. Rowley (2010) Consumer behaviour and sales forecast accuracy: what’s going on and how should revenue managers respond?. Journal of Revenue and Pricing Management 9 (4), pp. 374–376. Cited by: §3.1.
  • D. A. Dickey and W. A. Fuller (1979) Distribution of the estimators for autoregressive time series with a unit root. Journal of the American statistical association 74 (366a), pp. 427–431. Cited by: §3.4.
  • M. Fagiani, S. Squartini, L. Gabrielli, S. Spinsante, and F. Piazza (2015) A review of datasets and load forecasting techniques for smart natural gas and water grids: analysis and experiments. Neurocomputing 170, pp. 448–465. Cited by: §2.1.
  • R. Fildes (2006) The forecasting journals and their contribution to forecasting research: citation analysis and expert opinion. International Journal of forecasting 22 (3), pp. 415–432. Cited by: §1.
  • P. H. Franses and D. Van Dijk (2005) The forecasting performance of various models for seasonality and nonlinearity for quarterly industrial production. International Journal of Forecasting 21 (1), pp. 87–102. Cited by: §2.1.
  • M. Friedman (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11 (1), pp. 86–92. Cited by: §4.1.
  • L. A. Gil-Alana, J. Cunado, and F. Perez de Gracia (2008) Tourism in the canary islands: forecasting using several seasonal time series models. Journal of Forecasting 27 (7), pp. 621–636. Cited by: §2.1.
  • U. Gunter and I. Önder (2015) Forecasting international city tourism demand for paris: accuracy of uni-and multivariate models employing monthly data. Tourism Management 46, pp. 123–135. Cited by: §2.1.
  • T. Huang, R. Fildes, and D. Soopramanien (2014) The value of competitive information in forecasting fmcg retail product sales and the variable selection problem. European Journal of Operational Research 237 (2), pp. 738–748. Cited by: §3.1.
  • R. J. Hyndman and A. B. Koehler (2006) Another look at measures of forecast accuracy. International journal of forecasting 22 (4), pp. 679–688. Cited by: §3.5.
  • K. B. Kahn (2003) How to measure the impact of a forecast error on an enterprise?. The Journal of Business Forecasting 22 (1), pp. 21. Cited by: §1, §3.2.
  • S. Kolassa (2016) Evaluating predictive count data distributions in retail sales forecasting. International Journal of Forecasting 32 (3), pp. 788–803. Cited by: §3.5.
  • S. Lessmann, B. Baesens, H. Seow, and L. C. Thomas (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. European Journal of Operational Research 247 (1), pp. 124–136. Cited by: §2.1, §5.
  • S. Lessmann and S. Voß (2017) Car resale price forecasting: the impact of regression method, private information, and heterogeneity on forecast accuracy. International Journal of Forecasting 33 (4), pp. 864–877. Cited by: §1.
  • S. Ma, R. Fildes, and T. Huang (2016)

    Demand forecasting with high dimensional data: the case of sku retail sales forecasting with intra-and inter-category promotional information

    European Journal of Operational Research 249 (1), pp. 245–257. Cited by: §2.1, §3.1.
  • S. Makridakis and M. Hibon (2000) The m3-competition: results, conclusions and implications. International journal of forecasting 16 (4), pp. 451–476. Cited by: §1, §2.1, §2.1, §4.2, §5.
  • P. Nemenyi (1962) Distribution-free multiple comparisons. In Biometrics, Vol. 18, pp. 263. Cited by: §4.1.
  • K. Nikolopoulos, P. Goodwin, A. Patelis, and V. Assimakopoulos (2007) Forecasting with cue information: a comparison of multiple regression with alternative forecasting approaches. European Journal of Operational Research 180 (1), pp. 354–368. Cited by: §2.1.
  • D. R. Osborn, A. P. Chui, J. P. Smith, and C. R. Birchenhall (1988) Seasonality and the order of integration for consumption. Oxford Bulletin of Economics and Statistics 50 (4), pp. 361–377. Cited by: §3.4.
  • M. Óskarsdóttir, C. Bravo, W. Verbeke, C. Sarraute, B. Baesens, and J. Vanthienen (2017) Social network analytics for churn prediction in telco: model building, evaluation and network architecture. Expert Systems with Applications 85, pp. 204–220. Cited by: §2.2.
  • H. Peng, F. Long, and C. Ding (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 27 (8), pp. 1226–1238. Cited by: §3.3.
  • F. Petropoulos, S. Makridakis, V. Assimakopoulos, and K. Nikolopoulos (2014) ‘Horses for courses’ in demand forecasting. European Journal of Operational Research 237 (1), pp. 152–163. Cited by: §1, §1, §2.1.
  • P. Ramos, N. Santos, and R. Rebelo (2015) Performance of state space and arima models for consumer retail sales forecasting. Robotics and computer-integrated manufacturing 34, pp. 151–163. Cited by: §1, §2.1.
  • Y. R. Sagaert, E. Aghezzaf, N. Kourentzes, and B. Desmet (2017) Tactical sales forecasting using a very large set of macroeconomic indicators. European Journal of Operational Research. Cited by: §3.1, §4.2.
  • A. A. Santos, F. J. Nogales, and E. Ruiz (2012) Comparing univariate and multivariate models to forecast portfolio value-at-risk. Journal of financial econometrics 11 (2), pp. 400–441. Cited by: §2.1.
  • J. Sola and J. Sevilla (1997) Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Transactions on Nuclear Science 44 (3), pp. 1464–1468. Cited by: §3.4.
  • E. Stripling, S. vanden Broucke, K. Antonio, B. Baesens, and M. Snoeck (2015)

    Profit maximizing logistic regression modeling for customer churn prediction

    In Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on, pp. 1–10. Cited by: §2.2.
  • A. A. Syntetos, Z. Babai, J. E. Boylan, S. Kolassa, and K. Nikolopoulos (2016) Supply chain forecasting: theory, practice, their gap and the future. European Journal of Operational Research 252 (1), pp. 1–26. Cited by: §3.1.
  • L. J. Tashman (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. International journal of forecasting 16 (4), pp. 437–450. Cited by: §3.5.
  • J. W. Taylor, L. M. De Menezes, and P. E. McSharry (2006) A comparison of univariate methods for forecasting electricity demand up to a day ahead. International Journal of Forecasting 22 (1), pp. 1–16. Cited by: §2.1.
  • T. Van Calster, B. Baesens, and W. Lemahieu (2017) ProfARIMA: a profit-driven order identification algorithm for arima models in sales forecasting. Applied Soft Computing. Cited by: §1, §3.2, §3.2.
  • W. Verbeke, B. Baesens, and C. Bravo (2017) Profit driven business analytics: a practitioner’s guide to transforming big data into added value. John Wiley & Sons. Cited by: §2.2.
  • W. Verbeke, K. Dejaeger, D. Martens, J. Hur, and B. Baesens (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. European Journal of Operational Research 218 (1), pp. 211–229. Cited by: §2.2.
  • T. Verbraken, W. Verbeke, and B. Baesens (2013) A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Transactions on Knowledge and Data Engineering 25 (5), pp. 961–973. Cited by: §2.2.
  • M. Weller and S. Crone (2012) Supply chain forecasting: best practices & benchmarking study. Technical Paper. Lancaster Centre for Forecasting, pp. 1–42. Cited by: §2.1.
  • H. Yang, I. King, and L. Chan (2002) Non-fixed and asymmetrical margin approach to stock market prediction using support vector regression. In Neural Information Processing, 2002. ICONIP’02. Proceedings of the 9th International Conference on, Vol. 3, pp. 1398–1402. Cited by: §2.2.
  • G. P. Zhang and M. Qi (2005) Neural network forecasting for seasonal and trend time series. European journal of operational research 160 (2), pp. 501–514. Cited by: §3.4.
  • H. Zhao, A. P. Sinha, and G. Bansal (2011) An extended tuning method for cost-sensitive regression and forecasting. Decision Support Systems 51 (3), pp. 372–383. Cited by: §2.2.