For2For: Learning to forecast from forecasts

01/14/2020 ∙ by Shi Zhao, et al. ∙ Xi'an Jiaotong-Liverpool University 17

This paper presents a time series forecasting framework which combines standard forecasting methods and a machine learning model. The inputs to the machine learning model are not lagged values or regular time series features, but instead forecasts produced by standard methods. The machine learning model can be either a convolutional neural network model or a recurrent neural network model. The intuition behind this approach is that forecasts of a time series are themselves good features characterizing the series, especially when the modelling purpose is forecasting. It can also be viewed as a weighted ensemble method. Tested on the M4 competition dataset, this approach outperforms all submissions for quarterly series, and is more accurate than all but the winning algorithm for monthly series.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The competitiveness of neural network (NN) models and other machine learning (ML) models for time series forecasting compared to statistical models has long been questioned by practitioners [1] [2]. Although in the field of time series forecasting, there is a plethora of literature presenting complex novel models, in practice the performance of ML models is often below expectation [3]. For example, pure ML methods perform poorly in the M4 competition, with none of the five submitted pure ML solutions beating the Comb benchmark, which is the simple arithmetic average of Single, Holt and Damped exponential smoothing [4].

The poor performance of NN and general ML models in forecasting time series is in sharp contrast with the tremendous success of these models in the areas like computer vision and natural language processing. NN models are known to be capable of extracting features automatically from images for tasks like classification, while in the pre-deep learning era, such tasks were usually solved by feeding hand-crafted features to various ML models

[5]

. The availability of large dataset such as ImageNet, which has more than 14 million images, is believed to be crucial for the success of NN models in computer vision.

The lack of enough time series data is attributed as a reason for the underachievement of ML models in forecasting [3]. Although it is shown by Zhang et al. [6] that there exists feedforward NNs that are able to generate many popular time series features such as minimum, maximum and count above mean, without enough data, sophisticated models easily get overfitted and fail short in extracting true patterns. When the search space is so large and the guidance is so little, it is not surprising that ML models are not lucky enough to find the parameters to generalize well.

To make ML models work for time series forecasting, we need to provide more guidance, which can be either more data or better features. But usually it may be difficult or even impractical to collect enough relevant series. Therefore, generating good features seems to be a more viable option. After all, before deep learning gained popularity, the key to the success of many ML systems is feature engineering. Some even believe that “applied machine learning is basically feature engineering” [7].

This paper presents a time series forecasting framework we call For2For (forecasts to forecast). The framework is composed of standard off-the-shelf forecasting methods and a ML model. The inputs to the ML model are not lagged values or mined features like seasonality strength, but instead forecasts produced by standard methods such as ARIMA and ETS. The intuition behind this approach is that forecasts of a time series are themselves good features characterizing the series, especially when the modelling purpose is forecasting. The ML model can be either a convolutional neural network (CNN) model or a recurrent neural network (RNN) model. In other words, the guidance we provide to the NN models are the forecasts produced by other models, which we will refer to as base models.

Of course, a different way of viewing this approach is that the NN models are trained to combine the forecasts of base models, therefore essentially it is an ensemble model. But in our opinion, it is different from standard combination methods in two ways. First of all, this method does not try to select a single base model (as Talagala et al. [8]) or to assign weights to base models (as FFORMA [9]

). Instead it tries to combine the forecasts of base models differently at each forecasting point. We will elaborate this further in Section 2. Second, the inputs are forecasts of base models instead of time series features extracted by software packages such as

tsfeatures [10].

This method is tested on the M4 competition dataset and it is more accurate than a simple arithmetic combination of the base models. Furthermore, it beats all submissions for quarterly series, and outperforms all but the winning algorithm for monthly series.

The rest of the paper is organized as follows. Section 2 introduces the framework and presents the structures of the CNN model and the RNN model. The implementation details and testing results are discussed in Section 3 and Section 4 concludes the paper.

2 Methodology

Let us first define as a collection of time series to forecast, and the forecasting horizon is set to . For notational convenience, the first time step to forecast is defined as , therefore is known up to with .

2.1 The For2For framework

As illustrated in Figure 1, the For2For framework consists of two parts. In the first part, a group of base models are used to forecast time series , and the forecast results are

where denotes the forecast of produced by base model . This is common practice in various forecast combination methods [11] [9] [12]. In the second part of the framework, the forecasts produced by base models are fed into a NN model . The NN model can be either a CNN model or a RNN model. The forecast generated by the NN model is the final forecast result. Standard training and validation processes are used to build the NN model and are not drawn explicitly in the diagram.

Figure 1: For2For (forecasts to forecast) framework

In conventional combination approaches, the final forecast is a linear combination of forecasts generated by base models,

(1)

where is the weight assigned to base model when forecasting the series .

In this framework, we do not try to select the single best performing base model, nor do we try to assign weights to each base model. In fact, no weights are explicitly calculated at all. The NN model is trained to learn from the forecasts of base models and then make forecast automatically.

If we insist to view this approach as a weighted combination, then the weights are assigned to individual forecast result at each forecasting step,

(2)

This means that the forecasts of a given series by a base model at different are not necessarily trusted to the same extent in a multi-horizon forecasting task. In other words, the weight is not just a function of (which denotes time series) and (which denotes base model), but also a function of (forecasting step). It is only when is Equation 2 reduced to Equation 1.

Any popular forecasting methods can be included into the set of base models. After applying the base models to a time series , we obtain a matrix , with the element of the matrix being the forecast result at time by model . Other relevant information such as the domain type in the M4 dataset can be appended side by side to this matrix.

2.2 CNN model

As shown in Figure 2, the CNN model considered in this paper mimics the highly successful ResNet for image recognition [13]. In some sense the forecasts is treated as an image with only one channel.

The linear layer with a

parameter vector on the left simply combines the

base models linearly, and the output is a vector. This layer can also be viewed as a convolutional layer with padding and linear activation. Often such a linear model is not adequate, then it is necessary to include more nonlinear mechanisms in the model, and this is exactly what the residual network on the right is intended to do. Intuitively, the four convolutional layers with padding and sigmoid activation are used to capture the difference between forecasts by different base models at different time steps. A fully connected layer which outputs a vector is connected to the last convolutional layer. The number of

convolutional layers can be increased or decreased to reduce bias or variance when necessary. The fully connected layer could also be replaced by a

linear layer if there is an obvious overfitting problem. In the extreme case, all the convolutional layers and the fully connected layer are removed and the model collapses to a linear combination model.

When there are extra categorical features available, they may be incorporated to the model by a separate embedding layer and the final forecast is the summation of the outputs from the linear part, the residual part and the embedding part.

The CNN model is not flexible enough as to build the computation graph, it is necessary to know the forecasting horizon beforehand. This is a constraint placed by the fully connected layer. For this reason, when the forecasting horizons are different, as is the case for the series of different frequencies in the M4 competition, one model has to be built for each frequency.

Figure 2: CNN model structure

2.3 RNN model

The RNN model considered is more flexible than its CNN counterpart and is more interpretable. Figure 3 demonstrates the folded version of the RNN model considered in this paper, while Figure 4 shows the unfolded version, where is the vector containing the forecasts of series by all the base models at , is the state vector of the RNN cell at , is the final forecast of series at . The initial state is set to zero. The linear layer is used to convert the output vector from the RNN cell to a scalar. In such a structure, instead of making all the forecasts at the very last step, the forecast is produced one at a time.

Figure 3: RNN model structure, folded version
Figure 4: RNN model structure, unfolded version

Due to the existence of , is a combination of the forecasts up to . For example, when the model generates the final forecast at , it does not only look at the forecasts by base models at , which is , but also . In such a model, it is not necessary to know the forecasting horizon in advance. It is also possible to make predictions for a different other than the one used for model training.

Extra information available for forecasting can be fed to the model by expanding the input vector to the RNN cell. For example, the domain type of the series in the M4 dataset can be included into the model by first applying an embedding layer to the categorical feature and then appending the embedding results to the forecasts by base models at each time step.

3 Implementation and results

To assess the performance of the proposed method, the M4 dataset is used for experiments. Among the 100,000 time series in the dataset, there are 23,000 yearly series, 24,000 quarterly series, 48,000 monthly series, 359 weekly series, 4227 daily series and 414 hourly series [4]. The forecasting horizons of series of these frequencies are six, eight, 18, 13, 14 and 48, respectively. The series were collected from six domains: micro, industry, macro, finance, demographic and other.

3.1 Base models

Inspired by FFORMA, we choose the same base models as Montero-Manso et al. [9], with the exception of the naïve method. The reason that we exclude the naïve method is that at the data preprocessing stage, we normalize all the series by their last observations and then apply a log transformation. As a result, the forecasts at any time by the naïve method is simply zero. Including an all-zero vector into the inputs to a NN model adds no new information.

For the sake of completeness, the eight base models are listed in Table 1. These methods are implemented in the R package forecast [14] and can be readily used. We follow the practice in FFORMA so that the default settings of the R functions are used and the forecast is replaced by seasonal naïve forecast if an error is reported.

base model R function
random walk with drift with
seasonal naïve
theta method
ARIMA model
exponential smoothing method
TBATS model
STLM-AR with
feedforward neural network
Table 1: List of base models and R functions

3.2 Preprocessing and training

We take the monthly series as an example to show how the modelling process is carried out. First of all, the last 18 (horizon-number) observations of each of the 48,000 quarterly series are removed. Secondly, base models are applied to each chopped series to make forecasts. Negative forecasts are clipped by 10 as it is pointed out in [4] that the minimum of all series is 10. The clipped forecasts are normalized by the last observation and then a log transformation is applied.

At the model training stage, one third of randomly chosen series are held out as validation dataset for tuning model hyperparameters. Once the hyperparameters are determined, all the chopped series are used to re-train the final model, which will be used for the actual forecasting. The inputs to the final model are preprocessed forecasts of complete series produced by base models.

For fair comparison with FFORMA, the domain type of the series is not used in the experiments. Due to the stochastic nature of the initialization of model parameters and the fact that the training samples are randomly divided into

mini-batches in each training epoch, the final model obtained is not deterministic. To reduce uncertainty, multiple instances of the model are trained concurrently and the average of these instances is taken as the final forecasting result. This model averaging approach was also employed by Smyl in his submission that wins the competition

[15].

3.3 Accuracy measure

Mean absolute error (MAE) is used as the loss function. It is possible to improve the accuracy by using carefully designed custom loss function such as the pinball loss in

[15]. In the M4 competition, the accuracy of a model is measured by the symmetric mean absolute percentage error (sMAPE)

(3)

and the mean absolute scaled error (MASE)

(4)

where is the true value at point , the forecast, the forecasting horizon, the number of in-sample data points and the time interval to compute the difference between successive data points [4].

The final ranking of a submission in the competition is determined by the overall weight average (OWA) [4], which is defined by

(5)

where the subscript denotes the Naïve 2 method, which amounts to the naïve method applied to seasonally adjusted series. The Naïve 2 method was used in the competition as the benchmark to evaluate the performance of submitted results [4].

For ease of comparison, Table 2 lists the accuracy measures of the top ranked submissions for series of each frequency in the M4 competition. Since the proposed method has much in common with FFORMA, the performance of the FFORMA method is given in the last column.

measure frequencyrank 1 2 3 4 5 6 FFORMA
sMAPE Yearly 13.176 13.366 13.528 13.669 13.673 13.677 13.528
Quarterly 9.679 9.733 9.796 9.800 9.809 9.816 9.733
Monthly 12.126 12.487 12.639 12.737 12.747 12.770 12.639
Weekly 6.582 6.726 6.728 6.814 6.905 6.919 7.625
Daily 2.452 2.852 2.959 2.980 2.985 2.993 3.097
Hourly 8.913 9.328 9.611 9.765 9.934 11.336 11.506
Total 11.374 11.695 11.720 11.836 11.845 11.887 11.720
MASE Yearly 2.98 3.009 3.038 3.046 3.06 3.075 3.060
Quarterly 1.111 1.118 1.118 1.122 1.125 1.134 1.111
Monthly 0.884 0.893 0.895 0.905 0.907 0.913 0.893
Weekly 2.107 2.108 2.133 2.158 2.180 2.213 2.108
Daily 2.642 3.025 3.194 3.200 3.203 3.223 3.344
Hourly 0.801 0.810 0.810 0.819 0.856 0.861 0.819
Total 1.536 1.547 1.551 1.554 1.565 1.571 1.551
OWA Yearly 0.778 0.788 0.799 0.801 0.802 0.805 0.799
Quarterly 0.847 0.847 0.853 0.855 0.855 0.859 0.847
Monthly 0.836 0.854 0.858 0.867 0.868 0.876 0.858
Weekly 0.739 0.751 0.766 0.775 0.779 0.782 0.796
Daily 0.806 0.930 0.977 0.978 0.984 0.985 1.019
Hourly 0.410 0.440 0.444 0.474 0.477 0.477 0.484
Total 0.821 0.838 0.841 0.842 0.843 0.848 0.838
Table 2: Performance of top submissions

3.4 One model for each frequency

In this subsection, independent models are built for series of different frequencies. In the CNN model, four layers of

convolutional layers are used for yearly, quarterly, and monthly series. LSTM (long short-term memory) networks are used in the RNN model and the numbers of cell states are three for yearly series, four for quarterly series and six for monthly series.

We train each model eight times with fresh starts and in each run, the model is trained 2000 epochs. The sMAPE, MASE and OWA of each run for each frequency are shown in Table 3. The last column of the table gives the accuracy of the ensemble model. It is clear that model averaging improves the forecasting accuracy. In some cases, the emsemble is more accurate than any of the eight individual runs.

The proposed CNN model would rank 3rd, 2nd and 3rd in terms of sMAPE for yearly, quarterly and monthly series. It would rank 6th, 1st and 3rd in terms of OWA, respectively. From Table 3 it can be seen that the RNN model is more accurate than its CNN counterpart in terms of sMAPE. The benefits of model averaging is even more evident. For yearly, quarterly and monthly series, the RNN model would rank 3rd, 1st and 2nd, respectively, in terms of sMAPE, and would rank 8th, 1st and 2nd, respectively, in terms of OWA.

The CNN and the RNN models for yearly series have very good rankings in terms of sMAPE, but not so in terms of MASE. This is due to the difference of the two measures. The numerators in Equation 3 and Equation 4 are the same, while the denominators are different. For series with large mean value but very small variations, it is possible that the sMAPE of a forecast is very small, but the MASE is excessively high.

In our experiments, forecasts of base models are log transformed after being normalized by the last in-sample observation. Such a combination of preprocessing is well suited for reducing sMAPE. But when the purpose is to minimise MASE, it may be better not to do log transformation, but to simply normalize the forecasts of base models by the denominator of Equation 4. To strike a balance between these two measures, it is possible to combine forecasts under these two preprocessing settings. We do not do so in this paper, as our purpose is to show the generality of the proposed framework for time series forecasting. It should be mentioned that there are many possible ways to improve the implementation.

model measure frequencyinstance 1 2 3 4 5 6 7 8 emsemble
CNN sMAPE Yearly 13.5649 13.4904 13.5597 13.5856 13.5773 13.5090 13.5964 13.5848 13.4984
Quarterly 9.6966 9.7330 9.7050 9.6943 9.6690 9.7116 9.6926 9.6933 9.6802
Monthly 12.5993 12.6282 12.6066 12.6343 12.5937 12.7182 12.5805 12.6811 12.5478
MASE Yearly 3.1325 3.1023 3.1325 3.1587 3.1427 3.1096 3.1309 3.1441 3.1150
Quarterly 1.1060 1.1113 1.1067 1.1055 1.1033 1.1098 1.1040 1.1087 1.1038
Monthly 0.9037 0.9024 0.9003 0.9034 0.9039 0.9088 0.9061 0.9089 0.8964
OWA Yearly 0.8092 0.8031 0.8090 0.8131 0.8108 0.8046 0.8099 0.8112 0.8049
Quarterly 0.8436 0.8472 0.8443 0.8434 0.8414 0.8457 0.8427 0.8445 0.8421
Monthly 0.8617 0.8621 0.8604 0.8628 0.8616 0.8682 0.8622 0.8670 0.8565
RNN sMAPE Yearly 13.5601 13.5879 13.5949 13.5696 13.5649 13.5490 13.5549 13.5580 13.4928
Quarterly 9.6981 9.7016 9.7020 9.6991 9.7048 9.7034 9.7054 9.6825 9.6610
Monthly 12.5217 12.5556 12.5922 12.5097 12.5527 12.5240 12.5028 12.5702 12.4770
MASE Yearly 3.1415 3.1498 3.1470 3.1463 3.1357 3.1416 3.1309 3.1424 3.1254
Quarterly 1.1077 1.1068 1.1076 1.1088 1.1082 1.1076 1.1088 1.1043 1.1051
Monthly 0.8927 0.8930 0.8965 0.8926 0.8931 0.8949 0.8912 0.8942 0.8895
OWA Yearly 0.8101 0.8120 0.8119 0.8110 0.8096 0.8098 0.8086 0.8102 0.8061
Quarterly 0.8443 0.8441 0.8445 0.8448 0.8448 0.8445 0.8450 0.8424 0.8417
Monthly 0.8539 0.8552 0.8581 0.8534 0.8551 0.8550 0.8525 0.8562 0.8508
Table 3: The performance of the CNN and the RNN models, one model is built for each frequency

The proposed models have very similar performance as FFORMA for daily series. They do not perform as well for series with much fewer samples, i.e., for weekly (359 series) and hourly (414 series) data. In these cases, a simple linear model works better as it is less prone to overfitting. Once again, we observe that a large sample size is crucial for NN models to work well for time series forecasting.

In our experiments, one training sample is generated for each series, but it is possible to increase the samples using a stretching window scheme. For a given series of length , the one training sample is produced by first taking out the last observations and then applying the base models to the first observations. The forecasts by base models are used as the features in the NN model and the last observations are used as the label. To increase the samples, we can apply the base model to the first observations where is a positive integer, and use the observations immediately after as the labels.

3.5 One RNN model for all frequencies

As mentioned in Section 2.2, due to the constraint placed by the fully connected layer, independent CNN models have to be built for series with different forecasting horizons. By contrast, it is possible to build one single RNN model for series of all frequencies. The only restriction is that at the training stage, series of different frequencies have to be in different mini-batches, as the sequence lengths (forecasting horizons) are different.

measure frequencyinstance 1 2 3 4 5 6 7 8 emsemble
sMAPE Yearly 13.4835 13.5163 13.4990 13.4981 13.4544 13.5877 13.4865 13.5747 13.4419
Quarterly 9.7002 9.7302 9.7294 9.7011 9.7082 9.7142 9.7214 9.7143 9.6610
Monthly 12.5797 12.4808 12.5347 12.4990 12.6014 12.5821 12.5759 12.5425 12.4405
Weekly 8.4877 8.7098 8.6716 8.6312 8.5704 8.4801 8.6964 8.6404 8.4936
Daily 3.0959 3.0362 3.0286 3.0815 3.0586 3.0190 3.0491 3.0086 3.0354
Hourly 13.4411 13.3996 13.6902 13.5543 13.8240 13.4234 13.9391 13.4686 12.9756
Total 11.6845 11.6499 11.6723 11.6497 11.6905 11.7096 11.6893 11.6880 11.5942
MASE Yearly 3.1064 3.1159 3.1086 3.1090 3.0899 3.1384 3.0987 3.1385 3.0980
Quarterly 1.1081 1.1119 1.1098 1.1069 1.1098 1.1105 1.1097 1.1079 1.1043
Monthly 0.8999 0.8928 0.8956 0.8933 0.8976 0.8989 0.8994 0.8967 0.8887
Weekly 2.5105 2.4393 2.4114 2.3854 2.4421 2.4160 2.4445 2.4334 2.3772
Daily 3.4150 3.3094 3.2435 3.2838 3.2911 3.2673 3.2930 3.2025 3.2676
Hourly 1.4589 1.7442 1.6219 1.4449 1.5087 1.5389 1.5749 1.5111 1.4236
Total 1.5718 1.5679 1.5637 1.5629 1.5620 1.5730 1.5653 1.5685 1.5567
OWA Yearly 0.8034 0.8056 0.8041 0.8042 0.8004 0.8106 0.8025 0.8102 0.8011
Quarterly 0.8445 0.8473 0.8465 0.8442 0.8455 0.8461 0.8461 0.8451 0.8414
Monthly 0.8593 0.8525 0.8557 0.8534 0.8590 0.8589 0.8589 0.8565 0.8492
Weekly 0.9153 0.9146 0.9075 0.9006 0.9075 0.8978 0.9148 0.9097 0.8916
Daily 1.0293 1.0033 0.9920 1.0069 1.0042 0.9941 1.0030 0.9825 0.9968
Hourly 0.6702 0.7286 0.7110 0.6703 0.6910 0.6864 0.7079 0.6818 0.6501
Total 0.8418 0.8395 0.8392 0.8382 0.8395 0.8430 0.8403 0.8411 0.8345
Table 4: The performance of the RNN model for all frequencies

The full results of such a RNN model with a state size of nine are shown in Table 4. Interestingly, this single RNN model built for all series is more accurate than individual models built specifically for a particular frequency. It achieves slightly better OWA than FFORMA for quarterly, monthly and daily series, and does considerably worse for hourly series. Overall, the RNN model is marginally more accurate than FFORMA and would rank 2nd in terms of OWA among all the submissions.

4 Conclusion

We present a time series forecasting framework For2For. In the framework, forecasts produced by standard models are fed to a NN model, which learns how to make final forecast based on forecasts of various sources. This approach can be seen as a combination method, thus in essence the NN model is trained to learn how to combine forecasts made by standard models. The NN model can be either a CNN model with a structure similar to ResNet or a RNN model which makes forecast one at a time.

To evaluate this approach, we test the method on the M4 competition dataset. Both the CNN and the RNN models perform very well for yearly, quarterly and monthly series. When the data sample size is too small, the NN models tend to overfit and do not generalize well. We also build one single RNN model for all frequencies, and it is more accurate than individual models built specifically for a particular frequency.

In the experiments, we don’t use any features other than forecasts generated by base models. In practice, it is certainly possible to combine them. Prediction intervals are not discussed in this paper, and could be a topic of future investigation.

Acknowledgement

The work of Ying Feng is supported by XJTLU Research Development Fund 18-02-27.

References

  • [1] S. Makridakis, E. Spiliotis, V. Assimakopoulos, Statistical and machine learning forecasting methods: Concerns and ways forward, PloS one 13 (3) (2018) e0194889.
  • [2] S. Makridakis, R. J. Hyndman, F. Petropoulos, Forecasting in social settings: the state of the art, International Journal of Forecasting 36 (1) (2020) 15–28.
  • [3] H. Hewamalage, C. Bergmeir, K. Bandara, Recurrent neural networks for time series forecasting: Current status and future directions, arXiv preprint arXiv:1909.00590.
  • [4] S. Makridakis, E. Spiliotis, V. Assimakopoulos, The M4 competition: Results, findings, conclusion and way forward, International Journal of Forecasting 34 (4) (2018) 802–808.
  • [5]

    L. Nanni, S. Ghidoni, S. Brahnam, Handcrafted vs. non-handcrafted features for computer vision classification, Pattern Recognition 71 (2017) 158–172.

  • [6] R. Zhang, S. Dong, X. Nie, S. Xiao, Forward neural network for time series anomaly detection, arXiv preprint arXiv:1812.08389.
  • [7] A. Ng, Machine learning and AI via brain simulations, Accessed: May 3 (2013) 2018.
  • [8] T. S. Talagala, R. J. Hyndman, G. Athanasopoulos, Meta-learning how to forecast time series, Tech. rep., Monash University, Department of Econometrics and Business Statistics (2018).
  • [9] P. Montero-Manso, G. Athanasopoulos, R. J. Hyndman, T. S. Talagala, FFORMA: Feature-based forecast model averaging, International Journal of Forecasting 36 (1) (2020) 86–92.
  • [10] R. J. Hyndman, Y. Kang, T. Talagala, E. Wang, Y. Yang, tsfeatures: Time series feature extraction, R package version 1 (0).
  • [11] R. J. Hyndman, G. Athanasopoulos, Forecasting: principles and practice, OTexts, 2018.
  • [12] M. Pawlikowski, A. Chorowska, Weighted ensemble of statistical models, International Journal of Forecasting 36 (1) (2020) 93–97.
  • [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [14] R. J. Hyndman, Y. Khandakar, Automatic time series forecasting: the forecast package for R, Journal of Statistical Software 26 (3) (2008) 1–22.
  • [15] S. Smyl, A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting, International Journal of Forecasting 36 (1) (2020) 75–85.