Forecasting with time series imaging

04/17/2019 ∙ by Xixi Li, et al. ∙ Beihang University 0

Feature-based time series representation has attracted substantial attention in a wide range of time series analysis methods. Recently, the use of time series features for forecast model selection and model averaging has been an emerging research focus in the forecasting community. Nonetheless, most of the existing approaches depend on the manual choice of an appropriate set of features. Exploiting machine learning methods to automatically extract features from time series becomes crucially important in the state-of-the-art time series analysis. In this paper, we introduce an automated approach to extract time series features based on images. Time series are first transformed into recurrence images, from which local features can be extracted using computer vision algorithms. The extracted features are used for forecast model selection and model averaging. Our experiments show that forecasting based on automatically extracted features, with less human intervention and a more comprehensive view of the raw time series data, yields comparable performances with the top best methods proposed in the largest forecasting competition M4.



There are no comments yet.


page 4

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Feature-based time series representation has attracted remarkable attention in a vast majority of time series data mining tasks. Most of the time series problems, including time series clustering (e.g., Wang et al., 2006), classification (e.g., Fulcher & Jones, 2014; Nanopoulos et al., 2001)

and anomaly detection

(e.g., Hyndman et al., 2015), are eventually attributed to the quantification of similarity among time series data using time series feature representation. Specifically, in time series forecasting, instead of the typical time series forecasting procedure – fitting a model to the historical data and simulating future data based on the fitted model, selecting the most appropriate forecast model based on time series features has been a popular alternative approach in the last decades (e.g., Adam, 1973; Collopy & Armstrong, 1992; Wang et al., 2009; Petropoulos et al., 2014; Kang et al., 2017).

Many attempts have been made on feature-based model selection procedure for univariate time series forecasting. For example, Collopy & Armstrong (1992) provide 99 rules using 18 features to combine four extrapolation methods by examining a rule base to forecast annual economic and demographic time series; Arinze (1994)

describes the use of an artificial intelligence technique to improve forecasting accuracy and build an induction tree to model time series features and the most accurate forecasting method;

Shah (1997) constructs several individual selection rules for forecasting using discriminant analysis based on 26 time series features; Meade (2000) uses 25 summary statistics of time series as explanatory variables in predicting the relative performances of nine forecasting methods based on a set of simulated time series with known properties; Petropoulos et al. (2014) propose “horses for courses” and measured the effects of seven time series features to the forecasting performances of 14 popular forecasting methods on the monthly data in M3 dataset (Makridakis & Hibon, 2000); more recently, Kang et al. (2017) propose to visualize the performances of different forecasting methods in a two-dimensional principal component feature space and provided a preliminary understanding of their relative performances. Talagala et al. (2018)

present a general framework for forecast model selection using meta-learning. They use random forest to select the best forecasting method based on time series features.

Having revisited the literature on feature-based time series forecasting, we find that although researchers, for many times, highlight the usefulness of time series features in selecting the best forecasting method, most of the existing approaches depend on the manual choice of an appropriate set of features. That makes the forecast model selection process, replying on the data and the questions to be asked (Fulcher, 2018), not flexible, although Fulcher (2018) presents a comprehensive range of features that can be used to represent a time series, such as global features, subsequence features and other hybrid ones. Therefore, exploiting automated feature extraction from time series becomes vital. Inspired by the recent work of Hatami et al. (2017) and Wang & Oates (2015), this paper aims to explore time series forecasting based on model selection as well as model averaging with the idea of time series imaging, from which time series features can be automatically extracted using computer vision algorithms. The key contributions of our paper are as follows.

  1. We propose the use of time series imaging for forecasting model selection and model averaging, and demonstrate the proposed model is able to produce accurate forecasts.

  2. The proposed approach enables automated feature extraction. Opening a new window for time series forecasting, it is more flexible than forecasting based on manually selected time series features.

2 Image-based time series feature extraction

This paper extracts time series features based on time series imaging. We first encode time series into images using recurrence plots. Then time series features can be extracted from images using image processing techniques. From two different perspectives, we consider (1) Spatial Bag of Features (SBoF) model; and (2) Convolutional Neural Networks (CNN), for image feature extraction. We describe the details in the following sections.

2.1 Encoding time series to images

We use recurrence plots (RP) to encode time series to images, which take the most recent observations as the forecasts for all future periods. Recurrence plots provide a way to visualize the periodic nature of a trajectory through a phase space (Eckmann et al., 1987), and are able to contain all relevant dynamical information in the time series (Thiel et al., 2004). A recurrence plot of time series , showing when the time series revisits a previous state, can be formulated as

where is the element of the recurrence matrix ; indexes time on the x-axis of the recurrence plot, indexes time on the y-axis. is a predefined threshold, and is the Heaviside function. In short, one draws a black dot when and are closer than . Instead of binary output, an unthresholded RP is not binary, but is difficult to quantify. We use the following modified RP, which balances binary output and the unthresholded RP.

which gives more values than binary RP and results in colored plots. Fig. 1 shows three typical examples of recurrence plots. They reveal different patterns of recurrence plots for time series with randomness, periodicity, chaos and trend. We can see that recurrence plots (shown in the right column) visually contain the predefined patterns in the time series (shown in the left column).

Fig. 1:

Typical examples of recurrence plots (right column) for time series data with different patterns (left column): uncorrelated stochastic data, i.e., white noise (top), time series with periodicity and chaotic data (middle), and time series with periodicity and trend (bottom).

2.2 Spatial Bag of Features model (SBoF)

The original Bag of Features (BoF) model, which extracts features from one-dimensional signal segments, has achieved a great success in time series classification (Baydogan et al., 2013; Wang et al., 2013). Hatami et al. (2017) transform time-series into two-dimensional recurrence images with recurrence plot (Eckmann et al., 1987) and then applies the BoF model. Extracting time series features is then equivalent to identifying key points in images, which are called key descriptors. A promising algorithm is the Scale-invariant feature transform (SIFT) algorithm proposed by Lowe (1999)

that identify the maxima/minima of the difference of Gaussians (DoG) that occur at multiple scales space of an image as its key descriptors. Then each descriptor can be projected into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation with the locality constrained linear coding (LLC) method

(Wang et al., 2010). Furthermore, the BoF method tends to ignore spatial information of the image. We include spatial pyramid matching (SPM) (Lazebnik et al., 2006) technique in our work to capture the spatial information in an image.

To summarize, the top panel of Fig. 2 shows the framework of our method for image-based time series feature extraction, which consists of four steps: (i) encode time series as image with recurrence plots; (ii) detect key points with SIFT and find basic descriptors with -means; (iii) generate the final representation based on LLC; and (iv) extract spatial information via SPM. We interpret the details in each step, respectively, in the following sections.

Fig. 2: Image-based time series feature extraction with spatial bag-of-features model.


Scale-invariant feature transform (SIFT)

Scale-invariant feature transform (SIFT) is a computer vision algorithm, which is used to detect and describe local features in images. It finds key points in the spatial scale and extracts its position, scale, and rotation invariants. Key points are then taken as maxima/minima of the difference of Gaussians that occur at multiple scales. In our study, we use a 128-dimensional vector to characterize the key descriptors in an image. Firstly, we establish an 8-direction histogram in each

sub-region, and a total of sub-regions in the region around the key points are calculated. Then we calculate the magnitude and direction of each pixel’s gradient magnitude and add to the sub-region. In the end, a total of 128-dimensional image data based on histograms are generated.

Locality constrained Linear Coding (LLC)

Locality constrained Linear Coding (LLC) (Wang et al., 2010) utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation as

where , and is the vector of one descriptors. The basic descriptors is obtained by -means. The representation parameters , which are used as time series representation. The locality adaptor gives different freedom for each basis vector proportional to its similarity to the input descriptor. We use for adjusting the weight decay speed for the locality adaptor and is the adjustment factor. The LLC incremental codebook optimization is described in Algorithm 1.

4:for  do
6:     for  do // Locality constraint parameters
8:     end for
13:      // Remove bias
16:      // Update  basis
19:end for
Algorithm 1 LLC Incremental codebook optimization (Wang et al., 2010)
Spatial Pyramid Matching (SPM) and Max pooling

The BoF model calculates the distribution characteristics of feature points in the whole image, and then generates a global histogram, so the spatial distribution information of the image is lost, and the image may not be accurately identified.

A spatial pyramid method statistically distributes image feature points at different resolutions to obtain spatial information of images. The image is divided into progressively finer grid sequences at each level of the pyramid, and features are derived from each grid and combined into one large feature vector. Fig. 3 depicts the diagram of SPM and Max pooling process.

Fig. 3: Spatial Pyramid Matching and Max pooling.

2.3 Convolutional Neural Networks (CNN)

An alternative to SBoF for image feature extraction is to apply deep CNN, which has achieved great breakthrough in image processing (Krizhevsky et al., 2012). For example, Berkeley researchers (Donahue et al., 2014) propose feature extraction methods called DeCAF and directly used deep convolutional neural networks for feature extraction. Their experimental results show that the feature extraction method has greater advantages in accuracy compared with the traditional image features. In addition, some researchers(Razavian et al., 2014)

use the features acquired by the convolutional neural network as the input of the classifier, which significantly improves the accuracy of image classification. In this paper, we use deep networks to train raw data and rely on the network to extract richer and more expressive time series features. The benefit is obvious - it avoids complicated manual feature extraction and automatically extracts more expressive features.

The question to be answered by transfer learning

(Pan & Qiang, 2010) is:  Given a research area and task, how to use similar areas to transfer knowledge to achieve goals? Why do transfer learning? (1).Data labels are difficult to obtain (2).Building models from scratch is complex and time consuming.

The fine-tuning of deep networks (Ge & Yu, 2017)

is perhaps the easiest way to migrate deep networks. In short, it uses pre-trained networks and make adjustments to their own tasks. In practical applications, for a new task we usually don‘t need to train a neural network from scratch. (1) This kind of operation is obviously very time consuming. In particular, our training data cannot be as large as ImageNet can, and it can train deep neural networks with sufficient generalization ability. (2) Even with so much training data, the cost of training from scratch is unbearable because of large computation.

With the pre-trained model, we fix the parameters of the previous layers, and fine-tune the next few layers for our task. In general, the nearer to the front layer, the more general features can be extracted; the nearer to the back layer, the more specific features for classification tasks can be extracted. In this way, the speed of network training will be greatly accelerated, and it will also greatly promote the performance of our task.

Fig. 4: Transfer learning with fine-tuning. We can train classic CNN models, such as VGG(Simonyan & Zisserman, 2014) on ImageNet (Deng et al., 2009).

3 Time series forecasting with image features

Feature-based time series forecasting aims to find the best forecasting method among a pool of candidate forecasting methods, or their best forecast combination. Its essence is to link the knowledge on forecasting errors of different forecasting methods to time series features. Therefore, in this section, we focus on the mapping from time series features to forecasting method performances.

In this paper, following Montero-Manso, Athanasopoulos, Hyndman, Talagala et al. (2018), who won the second place in the M4 competition (Makridakis et al., 2018)

, we use nine most popular time series forecasting methods: automated ARIMA algorithm (ARIMA), automated exponential smoothing algorithm (ETS), NNET-AR model applying a feed-forward neural network using autoregressive inputs (NNET-AR), TBATS model (Exponential Smoothing State Space Model With Box-Cox Transformation, ARMA Errors, Trend And Seasonal Components), Seasonal and Trend decomposition using Loess with AR modeling of the seasonally adjusted series (STL-AR), random walk with drift (RW-DRIFT), theta method (THETA), naive (NAIVE), and seasonal naive (SNAIVE).

Fig. 5: Framework of forecast model averaging for the largest time series forecasting competition dataset M4 based on automatic feature extraction.

In M4 competition, Montero-Manso, Athanasopoulos, Hyndman, Talagala et al. (2018) propose a model averaging method based on 42 manual features. For validating the effectiveness of our image features of time series, we adopt their model averaging method to obtain the weights for forecast combination based on image features. Fig. 5 shows our framework of model averaging. It consists of two parts: training the model to obtain the weights of the nine forecasting methods from image features and testing the trained model. Overall Weighted Average (OWA) is an indicator of two accuracy measures: the Mean Absolute Scaled Error (MASE) and the symmetric Mean Absolute Percentage Error (sMAPE), which is used in M4 competition. The individual measures are calculated as follows:

where is the real value of the time series at point , is the forecasts, is the forecasting horizon and is the frequency of the data (e.g., 4 for quarterly series).

In essence, it is a feature-based gradient tree boosting approach where the loss or error function to minimize is tailored to the OWA error used in the M4 competition. The implementation of gradient tree boosting is XGBoost proposed by

Chen & Guestrin (2016), a tool that is computationally efficient and allows a high degree of customization.

Let be the image features extracted from a time series. is the contribution to the OWA error measure of method m for the series . is the output of the XGboost algorithm corresponding to forecasting method m, based on the features extracted from series .

In order to get the weight for every method, softmax transform is carried on the output of the XGboost by Montero-Manso, Talagala, Hyndman & Athanasopoulos (2018) .

The gradient tree boosting approach implemented in XGBoost works so that the weighted average loss function is minimized:

4 Application to M4 competition

4.1 Training and testing data

In order to get testing data, we divide the original time series in M4 into two parts. The first part is training data, whose length is length(original time series) - forecasting horizon. The second part is testing data, whose length is forecasting horizon. In order to get training data, we divide the training part of the testing data into two parts. The first part is training data, whose length is length(original time series) - 2 * forecasting horizon. The second part is testing data, whose length is forecasting horizon. Fig. 6 shows training and testing data partition strategy.

Fig. 6: Procedure to obtain training and testing data from M4.

4.2 Time series with different periods in the instance space

We project the time series with different periods into instance space using t-SNE (Maaten, 2014). Yearly, quarterly, monthly, daily and hourly data can be well distinguished in the instance spaces shown in Fig. 7.

Fig. 7: Regions where time series with different periods lie in the instance space. Blue points highlight areas where time series with the corresponding seasonal pattern locate.

4.3 Forecasting based on automated features

We compare our model selection results with the forecasting performances of the nine single methods. The model averaging results are compared with the top methods in M4 competition.

Table 1 shows the MASE values of our model selection with Lasso, SVM + rbf(10) (rbf is used as the kernel function in SVM and 10-dimensional features are used by dimensionality reduction with t-SNE) and SVM + rbf with all the image features. Our results can achieve equal accuracies to the best single method on monthly, weekly and all data. For hourly data, the average forecasting accuracy is significantly improved with our model selection method based on the automated features.

rank Yearly Quarterly Monthly Weekly Daily Hourly Total
Single method
auto_arima 3.45 1.17 0.93 2.38 3.35 0.94 1.67
ets 3.44 1.16 0.95 2.53 3.25 1.82 1.68
nnetar 4.05 1.55 1.15 3.84 4.13 1.07 2.05
tbats 3.44 1.19 1.05 2.49 3.28 1.23 1.73
stlm_ar 10.37 2.03 1.33 39.67 31.2 1.49 4.98
rw_drift 3.07 1.33 1.18 2.68 3.25 11.46 1.79
thetaf 3.37 1.23 0.97 2.64 3.26 2.45 1.69
naive 3.97 1.48 1.21 2.78 3.28 11.61 2.04
snaive 3.97 1.6 1.26 2.78 3.28 1.19 2.06
Min 3.07 1.16 0.93 2.38 3.25 0.94 1.67
Model selection+Recurrence plot
3.45 1.18 0.93 2.38 3.36 0.94 1.68
3.42 1.36 1.00 7.81 6.61 0.84 1.90
3.45 1.17 0.93 2.38 3.35 0.94 1.67
Pre trained CNN model+Classifier
3.45 1.18 0.93 2.37 3.35 0.94 1.67
3.45 1.17 0.93 2.38 3.35 0.94 1.67
3.45 1.17 0.93 2.38 3.35 0.94 1.67
3.45 1.18 0.93 2.70 3.52 0.94 1.69
Model selection+Gramian angular field
Pre trained CNN model+Classifier
3.47 1.18 0.94 2.38 3.38 0.94 1.69
3.45 1.18 0.93 2.37 3.35 0.93 1.68
3.45 1.18 0.93 2.35 3.35 0.93 1.68
3.45 1.17 0.95 2.53 3.34 1.81 1.69
Table 1: Model selection results compared with single methods in MASE.

Tables 2, 3 and 4 show the MASE, sMAPE and OWA values of our model averaging model with the top 10 most accurate methods in M4 competition, respectively. Overall, our model averaging with automated features can achieve comparable performances with the top methods in M4 competition. Specifically, it can be seen from Table 2, our method outperforms the best approach for daily and hourly data and performs equally well on yearly data.

rank Yearly Quarterly Monthly Weekly Daily Hourly Total
M4 competition
1 2.980 1.118 0.884 2.356 3.446 0.893 1.536
2 3.060 1.111 0.893 2.108 3.344 0.819 1.551
3 3.130 1.125 0.905 2.158 2.642 0.873 1.547
4 3.126 1.135 0.895 2.350 3.258 0.976 1.571
5 3.046 1.122 0.907 2.368 3.194 1.203 1.554
6 3.082 1.118 0.913 2.133 3.229 1.458 1.565
7 3.038 1.198 0.929 2.947 3.479 1.372 1.595
8 3.009 1.198 0.966 2.601 3.254 2.557 1.601
9 3.262 1.163 0.931 2.302 3.284 0.801 1.627
10 3.185 1.164 0.943 2.488 3.232 1.049 1.614
Model averaging+Recurrence plot
3.143 1.128 0.923 2.706 3.463 0.840 1.597
3.135 1.125 0.908 2.266 3.463 0.849 1.579
3.124 1.118 0.927 2.363 3.212 0.898 1.580
Pre trained CNN model+Classifier
3.118 1.121 0.942 2.387 3.344 0.861 1.592
3.113 1.122 0.919 2.361 3.348 0.845 1.581
3.111 1.122 0.955 2.375 3.357 0.854 1.598
3.153 1.124 0.940 2.332 3.318 0.858 1.599
Model averaging+Gramian angular field
Pre trained CNN model+Classifier
3.145 1.126 0.911 2.287 3.353 0.846 1.585
3.115 1.123 0.948 2.239 3.375 0.861 1.596
3.128 1.121 0.957 2.252 3.355 0.857 1.602
3.136 1.124 0.950 2.277 3.364 0.868 1.602
Table 2: Model averaging results compared with top 10 methods of M4 competition in MASE.
rank Yearly Quarterly Monthly Weekly Daily Hourly Total
M4 competition
1 13.176 9.679 12.126 7.817 3.170 9.328 11.374
2 13.528 9.733 12.639 7.625 3.097 11.506 11.720
3 13.943 9.796 12.747 6.919 2.452 9.611 11.845
4 13.712 9.809 12.487 6.814 3.037 9.934 11.695
5 13.673 9.816 12.737 8.627 2.985 15.563 11.836
6 13.669 9.800 12.888 6.726 2.995 13.167 11.897
7 13.679 10.378 12.839 7.818 3.222 13.466 12.020
8 13.366 10.155 13.002 9.148 3.041 17.567 11.986
9 13.910 10.000 12.780 6.728 3.053 8.913 11.924
10 13.821 10.093 13.151 8.989 3.026 9.765 12.114
Model averaging+Recurrence plot
13.935 9.855 12.656 8.502 3.175 11.913 11.859
13.896 9.863 12.596 7.899 3.063 11.772 11.816
13.881 9.858 12.625 8.289 3.017 12.296 11.824
Pre trained CNN model+Classifier
13.862 9.835 12.616 8.255 3.117 12.173 11.815
13.890 9.810 12.566 8.341 3.107 11.772 11.790
13.847 9.840 12.549 8.033 3.113 11.762 11.778
13.987 9.838 12.583 8.408 3.077 11.856 11.826
Model averaging+Gramian angular field
Pre trained CNN model+Classifier
13.926 9.859 12.639 8.161 3.103 12.077 11.846
13.861 9.811 12.597 7.851 3.124 11.933 11.798
13.914 9.808 12.574 7.887 3.067 11.882 11.796
13.949 9.868 12.617 7.937 3.070 12.078 11.839
Table 3: Model averaging results compared with top 10 methods of M4 competition in sMAPE.
rank Yearly Quarterly Monthly Weekly Daily Hourly Total
M4 competition
1 0.778 0.847 0.836 0.851 1.046 0.440 0.821
2 0.799 0.847 0.858 0.796 1.019 0.484 0.838
3 0.820 0.855 0.867 0.766 0.806 0.444 0.841
4 0.813 0.859 0.854 0.795 0.996 0.474 0.842
5 0.802 0.855 0.868 0.897 0.977 0.674 0.843
6 0.806 0.853 0.876 0.751 0.984 0.663 0.848
7 0.801 0.908 0.882 0.957 1.060 0.653 0.860
8 0.788 0.898 0.905 0.968 0.996 1.012 0.861
9 0.836 0.878 0.881 0.782 1.002 0.410 0.865
10 0.824 0.883 0.899 0.939 0.990 0.485 0.869
Model averaging+Recurrence plot
0.822 0.859 0.873 0.951 1.050 0.499 0.854
0.820 0.858 0.863 0.839 1.009 0.498 0.848
0.818 0.855 0.874 0.878 0.985 0.522 0.849
Pre trained CNN model+Classifier
0.816 0.856 0.880 0.880 1.022 0.511 0.852
0.817 0.855 0.868 0.880 1.021 0.497 0.848
0.815 0.856 0.884 0.866 1.023 0.498 0.852
0.825 0.857 0.878 0.878 1.011 0.502 0.854
Model averaging+Gramian angular field
Pre trained CNN model+Classifier
0.822 0.859 0.867 0.857 1.021 0.501 0.851
0.816 0.855 0.882 0.831 1.028 0.504 0.852
0.819 0.855 0.886 0.836 1.016 0.502 0.854
0.821 0.858 0.884 0.843 1.017 0.510 0.855
Table 4: Model averaging results compared with top 10 methods of M4 competition in OWA.

5 Conclusion and future work

Using image features for forecast model combination is proposed by our paper. The proposed method enables automated feature extraction, making it more flexible than using manually selected time series features. More importantly, it is able to produce comparable forecast accuracies with the top methods in largest time series forecasting competition (M4). To the best of our knowledge, this is the first paper that applies imaging to time series forecasting.

In this paper, we employ recurrence plots to encode time series in to images, and use spatial Bag-of-Features model to extract features from images. Also, depending on the size of the dataset and the limitation of computation resources, some classic convolution neural network (CNN) would be an alternative for feature extraction from images. To further improve the forecasting performances based on the automated features, the optimal weight training methods in model averaging needs to be further studied, making it suitable for high dimensional image features.


Yanfei Kang and Feng Li’s research were supported by the National Natural Science Foundation of China (No. 11701022 and No. 11501587, respectively).


Experimental setup

In traditional image processing method SIFT, before linear coding, we need to get basic descriptors. is chosen as the number of clusters. centroid coordinates are used as the coordinates of basic descriptors. We select close descriptors from

basic descriptors for each descriptor with K-nearest neighbors (KNN) and the adjustment factor

in LLC. We choose and as the parameter of SPM. We split the image by , and , respectively.

The parameters for Recurrence Plot are set as follows:

  • Parameter of eps: 0.1.

  • Parameter of steps: 5.

The parameters for SIFT are set as follows:

  • Number of basic descriptors: 200. Basic descriptors are obtained with k-means.

  • Parameter of LLC: in KNN. The adjustment factor .

  • Parameter of SPM: 1, 2 and 4. We split the images by , and , respectively.

  • Number of the extracted features from each image :

The parameters for pre trained CNN models are set as follows:

  • Dimension of the output of the pre trained Inception-v1: 1024.

  • Dimension of the output of the pre trained resnet-v1-101: 2048.

  • Dimension of the output of the pre trained resnet-v1-50: 2048.

  • Dimension of the output of the pre trained VGG: 1000.

In model averaging, we need to set parameters for XGBoost. We have performed a search in a subset of the hyper-parameter spaces, measuring OWA via a 10-fold cross validation of the training data.

The hyper-parameters are set as follows:

  • The maximum depth of a tree is from 6 to 50.

  • The learning rate, and the scale of contribution of each tree is from 0.001 to 1.

  • The proportion of the training set used to calculate the trees in each iteration is from 0.5 to 1.

  • The proportion of the features used to calculate the trees in each iteration is from 0.5 to 1.

  • The number of iterations of the algorithm is from 1 to 250.


  • (1)
  • Adam (1973) Adam, E. E. (1973), ‘Individual item forecasting model evaluation’, Decision Sciences 4(4), 458–470.
  • Arinze (1994) Arinze, B. (1994), ‘Selecting appropriate forecasting models using rule induction’, Omega-international Journal of Management Science 22(6), 647–658.
  • Baydogan et al. (2013) Baydogan, M. G., Runger, G. & Tuv, E. (2013), ‘A bag-of-features framework to classify time series’, IEEE transactions on pattern analysis and machine intelligence 35(11), 2796–2802.
  • Chen & Guestrin (2016) Chen, T. & Guestrin, C. (2016), Xgboost:a scalable tree boosting system, in ‘ACM SIGKDD International Conference on Knowledge Discovery and Data Mining’, pp. 785–794.
  • Collopy & Armstrong (1992) Collopy, F. & Armstrong, J. S. (1992), ‘Rule-based forecasting: development and validation of an expert systems approach to combining time series extrapolations’, Management Science 38(10), 1394–1414.
  • Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. & Li, F. F. (2009), Imagenet: A large-scale hierarchical image database, in

    ‘IEEE Conference on Computer Vision and Pattern Recognition’.

  • Donahue et al. (2014) Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Ning, Z., Tzeng, E., Darrell, T., Donahue, J., Jia, Y. & Vinyals, O. (2014), Decaf: A deep convolutional activation feature for generic visual recognition, in ‘International Conference on International Conference on Machine Learning’.
  • Eckmann et al. (1987) Eckmann, J.-P., Kamphorst, S. O. & Ruelle, D. (1987), ‘Recurrence plots of dynamical systems’, EPL (Europhysics Letters) 4(9), 973.
  • Fulcher (2018) Fulcher, B. D. (2018), Feature-based time-series analysis, in ‘Feature engineering for machine learning and data analytics’, CRC Press, pp. 87–116.
  • Fulcher & Jones (2014) Fulcher, B. & Jones, N. (2014), ‘Highly comparative feature-based time-series classification’, IEEE Transactions on Knowledge and Data Engineering 26(12), 3026–3037.
  • Ge & Yu (2017) Ge, W. & Yu, Y. (2017), Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning, in ‘Computer Vision and Pattern Recognition’.
  • Hatami et al. (2017) Hatami, N., Gavet, Y. & Debayle, J. (2017), ‘Bag of recurrence patterns representation for time-series classification’, Pattern Analysis and Applications pp. 1–11.
  • Hyndman et al. (2015) Hyndman, R. J., Wang, E. & Laptev, N. (2015), Large-scale unusual time series detection, in ‘Proceedings of the IEEE International Conference on Data Mining’, Atlantic City, NJ, USA. 14–17 November 2015.
  • Kang et al. (2017) Kang, Y., Hyndman, R. J. & Smith-Miles, K. (2017), ‘Visualising forecasting algorithm performance using time series instance spaces’, International Journal of Forecasting 33(2), 345–358.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I. & E. Hinton, G. (2012), ‘Imagenet classification with deep convolutional neural networks’, Neural Information Processing Systems 25.
  • Lazebnik et al. (2006) Lazebnik, S., Schmid, C. & Ponce, J. (2006), Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in ‘2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)’, Vol. 2, IEEE, pp. 2169–2178.
  • Lowe (1999) Lowe, D. G. (1999), Object recognition from local scale-invariant features, in ‘Computer vision, 1999. The proceedings of the seventh IEEE international conference on’, Vol. 2, IEEE, pp. 1150–1157.
  • Maaten (2014) Maaten, L. v. d. (2014), ‘Accelerating t-SNE using tree-based algorithms’, The Journal of Machine Learning Research 15(1), 3221–3245.
  • Makridakis & Hibon (2000) Makridakis, S. & Hibon, M. (2000), ‘The M3-Competition: results, conclusions and implications’, International Journal of Forecasting 16(4), 451–476.
  • Makridakis et al. (2018) Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018), ‘The m4 competition: Results, findings, conclusion and way forward’, International Journal of Forecasting .
  • Meade (2000) Meade, N. (2000), ‘Evidence for the selection of forecasting methods’, Journal of Forecasting 19(6), 515–535.
  • Montero-Manso, Athanasopoulos, Hyndman, Talagala et al. (2018) Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., Talagala, T. S. et al. (2018), ‘Fforma: Feature-based forecast model averaging’, Monash Econometrics and Business Statistics Working Papers 19(18), 2018–19.
  • Montero-Manso, Talagala, Hyndman & Athanasopoulos (2018) Montero-Manso, P., Talagala, T. S., Hyndman, R. & Athanasopoulos, G. (2018), ‘M4metalearning’, GitHub repository .
  • Nanopoulos et al. (2001) Nanopoulos, A., Alcock, R. & Manolopoulos, Y. (2001), ‘Feature-based classification of time-series data’, International Journal of Computer Research 10(3).
  • Pan & Qiang (2010) Pan, S. J. & Qiang, Y. (2010), ‘A survey on transfer learning’, IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359.
  • Petropoulos et al. (2014) Petropoulos, F., Makridakis, S., Assimakopoulos, V. & Nikolopoulos, K. (2014), “Horses for courses’ in demand forecasting’, European Journal of Operational Research 237(1), 152–163.
  • Razavian et al. (2014) Razavian, A. S., Azizpour, H., Sullivan, J. & Carlsson, S. (2014), ‘Cnn features off-the-shelf: An astounding baseline for recognition’.
  • Shah (1997) Shah, C. (1997), ‘Model selection in univariate time series forecasting using discriminant analysis’, International Journal of Forecasting 13(4), 489–500.
  • Simonyan & Zisserman (2014) Simonyan, K. & Zisserman, A. (2014), ‘Very deep convolutional networks for large-scale image recognition’, Computer Science .
  • Talagala et al. (2018) Talagala, T. S., Hyndman, R. J. & Athanasopoulos, G. (2018), Meta-learning how to forecast time series, Working paper 6/18, Monash University, Department of Econometrics and Business Statistics.
  • Thiel et al. (2004) Thiel, M., Romano, M. C. & Kurths, J. (2004), ‘How much information is contained in a recurrence plot?’, Physics Letters A 330(5), 343–349.
  • Wang et al. (2013) Wang, J., Liu, P., She, M. F., Nahavandi, S. & Kouzani, A. (2013), ‘Bag-of-words representation for biomedical time series classification’, Biomedical Signal Processing and Control 8(6), 634–644.
  • Wang et al. (2010) Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. & Gong, Y. (2010), Locality-constrained linear coding for image classification, in ‘Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on’, IEEE, pp. 3360–3367.
  • Wang et al. (2006) Wang, X., Smith, K. A. & Hyndman, R. J. (2006), ‘Characteristic-based clustering for time series data’, Data Mining and Knowledge Discovery 13(3), 335–364.
  • Wang et al. (2009) Wang, X., Smith-Miles, K. A. & Hyndman, R. J. (2009), ‘Rule induction for forecasting method selection: meta-learning the characteristics of univariate time series’, Neurocomputing 72(10-12), 2581–2594.
  • Wang & Oates (2015)

    Wang, Z. & Oates, T. (2015), Imaging time-series to improve classification and imputation,

    in ‘Proceedings of the 24th International Conference on Artificial Intelligence’, AAAI Press, pp. 3939–3945.