1 Introduction
Featurebased time series representation has attracted remarkable attention in a vast majority of time series data mining tasks. Most of the time series problems, including time series clustering (e.g., Wang et al., 2006), classification (e.g., Fulcher & Jones, 2014; Nanopoulos et al., 2001)
(e.g., Hyndman et al., 2015), are eventually attributed to the quantification of similarity among time series data using time series feature representation. Specifically, in time series forecasting, instead of the typical time series forecasting procedure – fitting a model to the historical data and simulating future data based on the fitted model, selecting the most appropriate forecast model based on time series features has been a popular alternative approach in the last decades (e.g., Adam, 1973; Collopy & Armstrong, 1992; Wang et al., 2009; Petropoulos et al., 2014; Kang et al., 2017).Many attempts have been made on featurebased model selection procedure for univariate time series forecasting. For example, Collopy & Armstrong (1992) provide 99 rules using 18 features to combine four extrapolation methods by examining a rule base to forecast annual economic and demographic time series; Arinze (1994)
describes the use of an artificial intelligence technique to improve forecasting accuracy and build an induction tree to model time series features and the most accurate forecasting method;
Shah (1997) constructs several individual selection rules for forecasting using discriminant analysis based on 26 time series features; Meade (2000) uses 25 summary statistics of time series as explanatory variables in predicting the relative performances of nine forecasting methods based on a set of simulated time series with known properties; Petropoulos et al. (2014) propose “horses for courses” and measured the effects of seven time series features to the forecasting performances of 14 popular forecasting methods on the monthly data in M3 dataset (Makridakis & Hibon, 2000); more recently, Kang et al. (2017) propose to visualize the performances of different forecasting methods in a twodimensional principal component feature space and provided a preliminary understanding of their relative performances. Talagala et al. (2018)present a general framework for forecast model selection using metalearning. They use random forest to select the best forecasting method based on time series features.
Having revisited the literature on featurebased time series forecasting, we find that although researchers, for many times, highlight the usefulness of time series features in selecting the best forecasting method, most of the existing approaches depend on the manual choice of an appropriate set of features. That makes the forecast model selection process, replying on the data and the questions to be asked (Fulcher, 2018), not flexible, although Fulcher (2018) presents a comprehensive range of features that can be used to represent a time series, such as global features, subsequence features and other hybrid ones. Therefore, exploiting automated feature extraction from time series becomes vital. Inspired by the recent work of Hatami et al. (2017) and Wang & Oates (2015), this paper aims to explore time series forecasting based on model selection as well as model averaging with the idea of time series imaging, from which time series features can be automatically extracted using computer vision algorithms. The key contributions of our paper are as follows.

We propose the use of time series imaging for forecasting model selection and model averaging, and demonstrate the proposed model is able to produce accurate forecasts.

The proposed approach enables automated feature extraction. Opening a new window for time series forecasting, it is more flexible than forecasting based on manually selected time series features.
2 Imagebased time series feature extraction
This paper extracts time series features based on time series imaging. We first encode time series into images using recurrence plots. Then time series features can be extracted from images using image processing techniques. From two different perspectives, we consider (1) Spatial Bag of Features (SBoF) model; and (2) Convolutional Neural Networks (CNN), for image feature extraction. We describe the details in the following sections.
2.1 Encoding time series to images
We use recurrence plots (RP) to encode time series to images, which take the most recent observations as the forecasts for all future periods. Recurrence plots provide a way to visualize the periodic nature of a trajectory through a phase space (Eckmann et al., 1987), and are able to contain all relevant dynamical information in the time series (Thiel et al., 2004). A recurrence plot of time series , showing when the time series revisits a previous state, can be formulated as
where is the element of the recurrence matrix ; indexes time on the xaxis of the recurrence plot, indexes time on the yaxis. is a predefined threshold, and is the Heaviside function. In short, one draws a black dot when and are closer than . Instead of binary output, an unthresholded RP is not binary, but is difficult to quantify. We use the following modified RP, which balances binary output and the unthresholded RP.
which gives more values than binary RP and results in colored plots. Fig. 1 shows three typical examples of recurrence plots. They reveal different patterns of recurrence plots for time series with randomness, periodicity, chaos and trend. We can see that recurrence plots (shown in the right column) visually contain the predefined patterns in the time series (shown in the left column).
Typical examples of recurrence plots (right column) for time series data with different patterns (left column): uncorrelated stochastic data, i.e., white noise (top), time series with periodicity and chaotic data (middle), and time series with periodicity and trend (bottom).
2.2 Spatial Bag of Features model (SBoF)
The original Bag of Features (BoF) model, which extracts features from onedimensional signal segments, has achieved a great success in time series classification (Baydogan et al., 2013; Wang et al., 2013). Hatami et al. (2017) transform timeseries into twodimensional recurrence images with recurrence plot (Eckmann et al., 1987) and then applies the BoF model. Extracting time series features is then equivalent to identifying key points in images, which are called key descriptors. A promising algorithm is the Scaleinvariant feature transform (SIFT) algorithm proposed by Lowe (1999)
that identify the maxima/minima of the difference of Gaussians (DoG) that occur at multiple scales space of an image as its key descriptors. Then each descriptor can be projected into its localcoordinate system, and the projected coordinates are integrated by max pooling to generate the final representation with the locality constrained linear coding (LLC) method
(Wang et al., 2010). Furthermore, the BoF method tends to ignore spatial information of the image. We include spatial pyramid matching (SPM) (Lazebnik et al., 2006) technique in our work to capture the spatial information in an image.To summarize, the top panel of Fig. 2 shows the framework of our method for imagebased time series feature extraction, which consists of four steps: (i) encode time series as image with recurrence plots; (ii) detect key points with SIFT and find basic descriptors with means; (iii) generate the final representation based on LLC; and (iv) extract spatial information via SPM. We interpret the details in each step, respectively, in the following sections.

[style=nextline]
 Scaleinvariant feature transform (SIFT)

Scaleinvariant feature transform (SIFT) is a computer vision algorithm, which is used to detect and describe local features in images. It finds key points in the spatial scale and extracts its position, scale, and rotation invariants. Key points are then taken as maxima/minima of the difference of Gaussians that occur at multiple scales. In our study, we use a 128dimensional vector to characterize the key descriptors in an image. Firstly, we establish an 8direction histogram in each
subregion, and a total of subregions in the region around the key points are calculated. Then we calculate the magnitude and direction of each pixel’s gradient magnitude and add to the subregion. In the end, a total of 128dimensional image data based on histograms are generated.  Locality constrained Linear Coding (LLC)

Locality constrained Linear Coding (LLC) (Wang et al., 2010) utilizes the locality constraints to project each descriptor into its localcoordinate system, and the projected coordinates are integrated by max pooling to generate the final representation as
where , and is the vector of one descriptors. The basic descriptors is obtained by means. The representation parameters , which are used as time series representation. The locality adaptor gives different freedom for each basis vector proportional to its similarity to the input descriptor. We use for adjusting the weight decay speed for the locality adaptor and is the adjustment factor. The LLC incremental codebook optimization is described in Algorithm 1.
 Spatial Pyramid Matching (SPM) and Max pooling

The BoF model calculates the distribution characteristics of feature points in the whole image, and then generates a global histogram, so the spatial distribution information of the image is lost, and the image may not be accurately identified.
A spatial pyramid method statistically distributes image feature points at different resolutions to obtain spatial information of images. The image is divided into progressively finer grid sequences at each level of the pyramid, and features are derived from each grid and combined into one large feature vector. Fig. 3 depicts the diagram of SPM and Max pooling process.
2.3 Convolutional Neural Networks (CNN)
An alternative to SBoF for image feature extraction is to apply deep CNN, which has achieved great breakthrough in image processing (Krizhevsky et al., 2012). For example, Berkeley researchers (Donahue et al., 2014) propose feature extraction methods called DeCAF and directly used deep convolutional neural networks for feature extraction. Their experimental results show that the feature extraction method has greater advantages in accuracy compared with the traditional image features. In addition, some researchers(Razavian et al., 2014)
use the features acquired by the convolutional neural network as the input of the classifier, which significantly improves the accuracy of image classification. In this paper, we use deep networks to train raw data and rely on the network to extract richer and more expressive time series features. The benefit is obvious  it avoids complicated manual feature extraction and automatically extracts more expressive features.
The question to be answered by transfer learning
(Pan & Qiang, 2010) is: Given a research area and task, how to use similar areas to transfer knowledge to achieve goals? Why do transfer learning? (1).Data labels are difficult to obtain (2).Building models from scratch is complex and time consuming.The finetuning of deep networks (Ge & Yu, 2017)
is perhaps the easiest way to migrate deep networks. In short, it uses pretrained networks and make adjustments to their own tasks. In practical applications, for a new task we usually don‘t need to train a neural network from scratch. (1) This kind of operation is obviously very time consuming. In particular, our training data cannot be as large as ImageNet can, and it can train deep neural networks with sufficient generalization ability. (2) Even with so much training data, the cost of training from scratch is unbearable because of large computation.
With the pretrained model, we fix the parameters of the previous layers, and finetune the next few layers for our task. In general, the nearer to the front layer, the more general features can be extracted; the nearer to the back layer, the more specific features for classification tasks can be extracted. In this way, the speed of network training will be greatly accelerated, and it will also greatly promote the performance of our task.
3 Time series forecasting with image features
Featurebased time series forecasting aims to find the best forecasting method among a pool of candidate forecasting methods, or their best forecast combination. Its essence is to link the knowledge on forecasting errors of different forecasting methods to time series features. Therefore, in this section, we focus on the mapping from time series features to forecasting method performances.
In this paper, following MonteroManso, Athanasopoulos, Hyndman, Talagala et al. (2018), who won the second place in the M4 competition (Makridakis et al., 2018)
, we use nine most popular time series forecasting methods: automated ARIMA algorithm (ARIMA), automated exponential smoothing algorithm (ETS), NNETAR model applying a feedforward neural network using autoregressive inputs (NNETAR), TBATS model (Exponential Smoothing State Space Model With BoxCox Transformation, ARMA Errors, Trend And Seasonal Components), Seasonal and Trend decomposition using Loess with AR modeling of the seasonally adjusted series (STLAR), random walk with drift (RWDRIFT), theta method (THETA), naive (NAIVE), and seasonal naive (SNAIVE).
In M4 competition, MonteroManso, Athanasopoulos, Hyndman, Talagala et al. (2018) propose a model averaging method based on 42 manual features. For validating the effectiveness of our image features of time series, we adopt their model averaging method to obtain the weights for forecast combination based on image features. Fig. 5 shows our framework of model averaging. It consists of two parts: training the model to obtain the weights of the nine forecasting methods from image features and testing the trained model. Overall Weighted Average (OWA) is an indicator of two accuracy measures: the Mean Absolute Scaled Error (MASE) and the symmetric Mean Absolute Percentage Error (sMAPE), which is used in M4 competition. The individual measures are calculated as follows:
where is the real value of the time series at point , is the forecasts, is the forecasting horizon and is the frequency of the data (e.g., 4 for quarterly series).
In essence, it is a featurebased gradient tree boosting approach where the loss or error function to minimize is tailored to the OWA error used in the M4 competition. The implementation of gradient tree boosting is XGBoost proposed by
Chen & Guestrin (2016), a tool that is computationally efficient and allows a high degree of customization.Let be the image features extracted from a time series. is the contribution to the OWA error measure of method m for the series . is the output of the XGboost algorithm corresponding to forecasting method m, based on the features extracted from series .
In order to get the weight for every method, softmax transform is carried on the output of the XGboost by MonteroManso, Talagala, Hyndman & Athanasopoulos (2018) .
The gradient tree boosting approach implemented in XGBoost works so that the weighted average loss function is minimized:
4 Application to M4 competition
4.1 Training and testing data
In order to get testing data, we divide the original time series in M4 into two parts. The first part is training data, whose length is length(original time series)  forecasting horizon. The second part is testing data, whose length is forecasting horizon. In order to get training data, we divide the training part of the testing data into two parts. The first part is training data, whose length is length(original time series)  2 * forecasting horizon. The second part is testing data, whose length is forecasting horizon. Fig. 6 shows training and testing data partition strategy.
4.2 Time series with different periods in the instance space
We project the time series with different periods into instance space using tSNE (Maaten, 2014). Yearly, quarterly, monthly, daily and hourly data can be well distinguished in the instance spaces shown in Fig. 7.
4.3 Forecasting based on automated features
We compare our model selection results with the forecasting performances of the nine single methods. The model averaging results are compared with the top methods in M4 competition.
Table 1 shows the MASE values of our model selection with Lasso, SVM + rbf(10) (rbf is used as the kernel function in SVM and 10dimensional features are used by dimensionality reduction with tSNE) and SVM + rbf with all the image features. Our results can achieve equal accuracies to the best single method on monthly, weekly and all data. For hourly data, the average forecasting accuracy is significantly improved with our model selection method based on the automated features.
rank  Yearly  Quarterly  Monthly  Weekly  Daily  Hourly  Total 

Single method  
auto_arima  3.45  1.17  0.93  2.38  3.35  0.94  1.67 
ets  3.44  1.16  0.95  2.53  3.25  1.82  1.68 
nnetar  4.05  1.55  1.15  3.84  4.13  1.07  2.05 
tbats  3.44  1.19  1.05  2.49  3.28  1.23  1.73 
stlm_ar  10.37  2.03  1.33  39.67  31.2  1.49  4.98 
rw_drift  3.07  1.33  1.18  2.68  3.25  11.46  1.79 
thetaf  3.37  1.23  0.97  2.64  3.26  2.45  1.69 
naive  3.97  1.48  1.21  2.78  3.28  11.61  2.04 
snaive  3.97  1.6  1.26  2.78  3.28  1.19  2.06 
Min  3.07  1.16  0.93  2.38  3.25  0.94  1.67 
Model selection+Recurrence plot  
3.45  1.18  0.93  2.38  3.36  0.94  1.68  
3.42  1.36  1.00  7.81  6.61  0.84  1.90  
3.45  1.17  0.93  2.38  3.35  0.94  1.67  
Pre trained CNN model+Classifier  
3.45  1.18  0.93  2.37  3.35  0.94  1.67  
3.45  1.17  0.93  2.38  3.35  0.94  1.67  
3.45  1.17  0.93  2.38  3.35  0.94  1.67  
3.45  1.18  0.93  2.70  3.52  0.94  1.69  
Model selection+Gramian angular field  
Pre trained CNN model+Classifier  
3.47  1.18  0.94  2.38  3.38  0.94  1.69  
3.45  1.18  0.93  2.37  3.35  0.93  1.68  
3.45  1.18  0.93  2.35  3.35  0.93  1.68  
3.45  1.17  0.95  2.53  3.34  1.81  1.69 
Tables 2, 3 and 4 show the MASE, sMAPE and OWA values of our model averaging model with the top 10 most accurate methods in M4 competition, respectively. Overall, our model averaging with automated features can achieve comparable performances with the top methods in M4 competition. Specifically, it can be seen from Table 2, our method outperforms the best approach for daily and hourly data and performs equally well on yearly data.
rank  Yearly  Quarterly  Monthly  Weekly  Daily  Hourly  Total 

M4 competition  
1  2.980  1.118  0.884  2.356  3.446  0.893  1.536 
2  3.060  1.111  0.893  2.108  3.344  0.819  1.551 
3  3.130  1.125  0.905  2.158  2.642  0.873  1.547 
4  3.126  1.135  0.895  2.350  3.258  0.976  1.571 
5  3.046  1.122  0.907  2.368  3.194  1.203  1.554 
6  3.082  1.118  0.913  2.133  3.229  1.458  1.565 
7  3.038  1.198  0.929  2.947  3.479  1.372  1.595 
8  3.009  1.198  0.966  2.601  3.254  2.557  1.601 
9  3.262  1.163  0.931  2.302  3.284  0.801  1.627 
10  3.185  1.164  0.943  2.488  3.232  1.049  1.614 
Model averaging+Recurrence plot  
3.143  1.128  0.923  2.706  3.463  0.840  1.597  
3.135  1.125  0.908  2.266  3.463  0.849  1.579  
3.124  1.118  0.927  2.363  3.212  0.898  1.580  
Pre trained CNN model+Classifier  
3.118  1.121  0.942  2.387  3.344  0.861  1.592  
3.113  1.122  0.919  2.361  3.348  0.845  1.581  
3.111  1.122  0.955  2.375  3.357  0.854  1.598  
3.153  1.124  0.940  2.332  3.318  0.858  1.599  
Model averaging+Gramian angular field  
Pre trained CNN model+Classifier  
3.145  1.126  0.911  2.287  3.353  0.846  1.585  
3.115  1.123  0.948  2.239  3.375  0.861  1.596  
3.128  1.121  0.957  2.252  3.355  0.857  1.602  
3.136  1.124  0.950  2.277  3.364  0.868  1.602 
rank  Yearly  Quarterly  Monthly  Weekly  Daily  Hourly  Total 

M4 competition  
1  13.176  9.679  12.126  7.817  3.170  9.328  11.374 
2  13.528  9.733  12.639  7.625  3.097  11.506  11.720 
3  13.943  9.796  12.747  6.919  2.452  9.611  11.845 
4  13.712  9.809  12.487  6.814  3.037  9.934  11.695 
5  13.673  9.816  12.737  8.627  2.985  15.563  11.836 
6  13.669  9.800  12.888  6.726  2.995  13.167  11.897 
7  13.679  10.378  12.839  7.818  3.222  13.466  12.020 
8  13.366  10.155  13.002  9.148  3.041  17.567  11.986 
9  13.910  10.000  12.780  6.728  3.053  8.913  11.924 
10  13.821  10.093  13.151  8.989  3.026  9.765  12.114 
Model averaging+Recurrence plot  
13.935  9.855  12.656  8.502  3.175  11.913  11.859  
13.896  9.863  12.596  7.899  3.063  11.772  11.816  
13.881  9.858  12.625  8.289  3.017  12.296  11.824  
Pre trained CNN model+Classifier  
13.862  9.835  12.616  8.255  3.117  12.173  11.815  
13.890  9.810  12.566  8.341  3.107  11.772  11.790  
13.847  9.840  12.549  8.033  3.113  11.762  11.778  
13.987  9.838  12.583  8.408  3.077  11.856  11.826  
Model averaging+Gramian angular field  
Pre trained CNN model+Classifier  
13.926  9.859  12.639  8.161  3.103  12.077  11.846  
13.861  9.811  12.597  7.851  3.124  11.933  11.798  
13.914  9.808  12.574  7.887  3.067  11.882  11.796  
13.949  9.868  12.617  7.937  3.070  12.078  11.839 
rank  Yearly  Quarterly  Monthly  Weekly  Daily  Hourly  Total 

M4 competition  
1  0.778  0.847  0.836  0.851  1.046  0.440  0.821 
2  0.799  0.847  0.858  0.796  1.019  0.484  0.838 
3  0.820  0.855  0.867  0.766  0.806  0.444  0.841 
4  0.813  0.859  0.854  0.795  0.996  0.474  0.842 
5  0.802  0.855  0.868  0.897  0.977  0.674  0.843 
6  0.806  0.853  0.876  0.751  0.984  0.663  0.848 
7  0.801  0.908  0.882  0.957  1.060  0.653  0.860 
8  0.788  0.898  0.905  0.968  0.996  1.012  0.861 
9  0.836  0.878  0.881  0.782  1.002  0.410  0.865 
10  0.824  0.883  0.899  0.939  0.990  0.485  0.869 
Model averaging+Recurrence plot  
0.822  0.859  0.873  0.951  1.050  0.499  0.854  
0.820  0.858  0.863  0.839  1.009  0.498  0.848  
0.818  0.855  0.874  0.878  0.985  0.522  0.849  
Pre trained CNN model+Classifier  
0.816  0.856  0.880  0.880  1.022  0.511  0.852  
0.817  0.855  0.868  0.880  1.021  0.497  0.848  
0.815  0.856  0.884  0.866  1.023  0.498  0.852  
0.825  0.857  0.878  0.878  1.011  0.502  0.854  
Model averaging+Gramian angular field  
Pre trained CNN model+Classifier  
0.822  0.859  0.867  0.857  1.021  0.501  0.851  
0.816  0.855  0.882  0.831  1.028  0.504  0.852  
0.819  0.855  0.886  0.836  1.016  0.502  0.854  
0.821  0.858  0.884  0.843  1.017  0.510  0.855 
5 Conclusion and future work
Using image features for forecast model combination is proposed by our paper. The proposed method enables automated feature extraction, making it more flexible than using manually selected time series features. More importantly, it is able to produce comparable forecast accuracies with the top methods in largest time series forecasting competition (M4). To the best of our knowledge, this is the first paper that applies imaging to time series forecasting.
In this paper, we employ recurrence plots to encode time series in to images, and use spatial BagofFeatures model to extract features from images. Also, depending on the size of the dataset and the limitation of computation resources, some classic convolution neural network (CNN) would be an alternative for feature extraction from images. To further improve the forecasting performances based on the automated features, the optimal weight training methods in model averaging needs to be further studied, making it suitable for high dimensional image features.
Acknowledgements
Yanfei Kang and Feng Li’s research were supported by the National Natural Science Foundation of China (No. 11701022 and No. 11501587, respectively).
Appendix
Experimental setup
In traditional image processing method SIFT, before linear coding, we need to get basic descriptors. is chosen as the number of clusters. centroid coordinates are used as the coordinates of basic descriptors. We select close descriptors from
basic descriptors for each descriptor with Knearest neighbors (KNN) and the adjustment factor
in LLC. We choose , and as the parameter of SPM. We split the image by , and , respectively.The parameters for Recurrence Plot are set as follows:

Parameter of eps: 0.1.

Parameter of steps: 5.
The parameters for SIFT are set as follows:

Number of basic descriptors: 200. Basic descriptors are obtained with kmeans.

Parameter of LLC: in KNN. The adjustment factor .

Parameter of SPM: 1, 2 and 4. We split the images by , and , respectively.

Number of the extracted features from each image :
The parameters for pre trained CNN models are set as follows:

Dimension of the output of the pre trained Inceptionv1: 1024.

Dimension of the output of the pre trained resnetv1101: 2048.

Dimension of the output of the pre trained resnetv150: 2048.

Dimension of the output of the pre trained VGG: 1000.
In model averaging, we need to set parameters for XGBoost. We have performed a search in a subset of the hyperparameter spaces, measuring OWA via a 10fold cross validation of the training data.
The hyperparameters are set as follows:

The maximum depth of a tree is from 6 to 50.

The learning rate, and the scale of contribution of each tree is from 0.001 to 1.

The proportion of the training set used to calculate the trees in each iteration is from 0.5 to 1.

The proportion of the features used to calculate the trees in each iteration is from 0.5 to 1.

The number of iterations of the algorithm is from 1 to 250.
References
 (1)
 Adam (1973) Adam, E. E. (1973), ‘Individual item forecasting model evaluation’, Decision Sciences 4(4), 458–470.
 Arinze (1994) Arinze, B. (1994), ‘Selecting appropriate forecasting models using rule induction’, Omegainternational Journal of Management Science 22(6), 647–658.
 Baydogan et al. (2013) Baydogan, M. G., Runger, G. & Tuv, E. (2013), ‘A bagoffeatures framework to classify time series’, IEEE transactions on pattern analysis and machine intelligence 35(11), 2796–2802.
 Chen & Guestrin (2016) Chen, T. & Guestrin, C. (2016), Xgboost:a scalable tree boosting system, in ‘ACM SIGKDD International Conference on Knowledge Discovery and Data Mining’, pp. 785–794.
 Collopy & Armstrong (1992) Collopy, F. & Armstrong, J. S. (1992), ‘Rulebased forecasting: development and validation of an expert systems approach to combining time series extrapolations’, Management Science 38(10), 1394–1414.

Deng et al. (2009)
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. & Li, F. F.
(2009), Imagenet: A largescale hierarchical
image database, in
‘IEEE Conference on Computer Vision and Pattern Recognition’.
 Donahue et al. (2014) Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Ning, Z., Tzeng, E., Darrell, T., Donahue, J., Jia, Y. & Vinyals, O. (2014), Decaf: A deep convolutional activation feature for generic visual recognition, in ‘International Conference on International Conference on Machine Learning’.
 Eckmann et al. (1987) Eckmann, J.P., Kamphorst, S. O. & Ruelle, D. (1987), ‘Recurrence plots of dynamical systems’, EPL (Europhysics Letters) 4(9), 973.
 Fulcher (2018) Fulcher, B. D. (2018), Featurebased timeseries analysis, in ‘Feature engineering for machine learning and data analytics’, CRC Press, pp. 87–116.
 Fulcher & Jones (2014) Fulcher, B. & Jones, N. (2014), ‘Highly comparative featurebased timeseries classification’, IEEE Transactions on Knowledge and Data Engineering 26(12), 3026–3037.
 Ge & Yu (2017) Ge, W. & Yu, Y. (2017), Borrowing treasures from the wealthy: Deep transfer learning through selective joint finetuning, in ‘Computer Vision and Pattern Recognition’.
 Hatami et al. (2017) Hatami, N., Gavet, Y. & Debayle, J. (2017), ‘Bag of recurrence patterns representation for timeseries classification’, Pattern Analysis and Applications pp. 1–11.
 Hyndman et al. (2015) Hyndman, R. J., Wang, E. & Laptev, N. (2015), Largescale unusual time series detection, in ‘Proceedings of the IEEE International Conference on Data Mining’, Atlantic City, NJ, USA. 14–17 November 2015.
 Kang et al. (2017) Kang, Y., Hyndman, R. J. & SmithMiles, K. (2017), ‘Visualising forecasting algorithm performance using time series instance spaces’, International Journal of Forecasting 33(2), 345–358.
 Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I. & E. Hinton, G. (2012), ‘Imagenet classification with deep convolutional neural networks’, Neural Information Processing Systems 25.
 Lazebnik et al. (2006) Lazebnik, S., Schmid, C. & Ponce, J. (2006), Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in ‘2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)’, Vol. 2, IEEE, pp. 2169–2178.
 Lowe (1999) Lowe, D. G. (1999), Object recognition from local scaleinvariant features, in ‘Computer vision, 1999. The proceedings of the seventh IEEE international conference on’, Vol. 2, IEEE, pp. 1150–1157.
 Maaten (2014) Maaten, L. v. d. (2014), ‘Accelerating tSNE using treebased algorithms’, The Journal of Machine Learning Research 15(1), 3221–3245.
 Makridakis & Hibon (2000) Makridakis, S. & Hibon, M. (2000), ‘The M3Competition: results, conclusions and implications’, International Journal of Forecasting 16(4), 451–476.
 Makridakis et al. (2018) Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018), ‘The m4 competition: Results, findings, conclusion and way forward’, International Journal of Forecasting .
 Meade (2000) Meade, N. (2000), ‘Evidence for the selection of forecasting methods’, Journal of Forecasting 19(6), 515–535.
 MonteroManso, Athanasopoulos, Hyndman, Talagala et al. (2018) MonteroManso, P., Athanasopoulos, G., Hyndman, R. J., Talagala, T. S. et al. (2018), ‘Fforma: Featurebased forecast model averaging’, Monash Econometrics and Business Statistics Working Papers 19(18), 2018–19.
 MonteroManso, Talagala, Hyndman & Athanasopoulos (2018) MonteroManso, P., Talagala, T. S., Hyndman, R. & Athanasopoulos, G. (2018), ‘M4metalearning’, GitHub repository .
 Nanopoulos et al. (2001) Nanopoulos, A., Alcock, R. & Manolopoulos, Y. (2001), ‘Featurebased classification of timeseries data’, International Journal of Computer Research 10(3).
 Pan & Qiang (2010) Pan, S. J. & Qiang, Y. (2010), ‘A survey on transfer learning’, IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359.
 Petropoulos et al. (2014) Petropoulos, F., Makridakis, S., Assimakopoulos, V. & Nikolopoulos, K. (2014), “Horses for courses’ in demand forecasting’, European Journal of Operational Research 237(1), 152–163.
 Razavian et al. (2014) Razavian, A. S., Azizpour, H., Sullivan, J. & Carlsson, S. (2014), ‘Cnn features offtheshelf: An astounding baseline for recognition’.
 Shah (1997) Shah, C. (1997), ‘Model selection in univariate time series forecasting using discriminant analysis’, International Journal of Forecasting 13(4), 489–500.
 Simonyan & Zisserman (2014) Simonyan, K. & Zisserman, A. (2014), ‘Very deep convolutional networks for largescale image recognition’, Computer Science .
 Talagala et al. (2018) Talagala, T. S., Hyndman, R. J. & Athanasopoulos, G. (2018), Metalearning how to forecast time series, Working paper 6/18, Monash University, Department of Econometrics and Business Statistics.
 Thiel et al. (2004) Thiel, M., Romano, M. C. & Kurths, J. (2004), ‘How much information is contained in a recurrence plot?’, Physics Letters A 330(5), 343–349.
 Wang et al. (2013) Wang, J., Liu, P., She, M. F., Nahavandi, S. & Kouzani, A. (2013), ‘Bagofwords representation for biomedical time series classification’, Biomedical Signal Processing and Control 8(6), 634–644.
 Wang et al. (2010) Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. & Gong, Y. (2010), Localityconstrained linear coding for image classification, in ‘Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on’, IEEE, pp. 3360–3367.
 Wang et al. (2006) Wang, X., Smith, K. A. & Hyndman, R. J. (2006), ‘Characteristicbased clustering for time series data’, Data Mining and Knowledge Discovery 13(3), 335–364.
 Wang et al. (2009) Wang, X., SmithMiles, K. A. & Hyndman, R. J. (2009), ‘Rule induction for forecasting method selection: metalearning the characteristics of univariate time series’, Neurocomputing 72(1012), 2581–2594.

Wang & Oates (2015)
Wang, Z. & Oates, T. (2015), Imaging timeseries to improve classification and imputation,
in ‘Proceedings of the 24th International Conference on Artificial Intelligence’, AAAI Press, pp. 3939–3945.
Comments
There are no comments yet.