1 Introduction
The problem of time series forecasting attracts attention of a vast number of researchers due to its great practical importance in prediction of various economic, social and physical phenomena. There are several approaches to that problem: autoregression models
[1], exponential smoothing [2][3, 4] and many others. Despite this, the problem of improving the accuracy of forecasting remains relevant.In this paper we develop an algorithm of time series forecasting based on data compression techniques. The detailed description of compressionbased approach to timeseries forecasting can be found in [5]. Nowadays, there are many efficient lossless datacompressors (or archivers) which are widely used in information technologies. These compressors are based on different ideas and approaches, among which, we note the PPM universal code [6], BurrowsWheeler transformation [7], dictionarybased compression algorithms [8, 9], grammarbased codes [10].
The main contribution of this study is that we show how to apply standard, wellknown file compression programs to forecast realworld time series. It’s important to note that modern data compression algorithms use a variety of heuristics to improve the compression ratio. Thus, our approach allows to use algorithms with proven efficiency. Moreover, our method is able to use a set of algorithms and "automatically" select the most accurate among them. Besides, the described approach gives a possibility to apply some methods of artificial intelligence for timeseries forecasting. Those methods are used in modern data compression algorithms in order to find many kinds of latent regularities.
The rest of the paper is organized as follows. In section 1 we briefly describe the relationship between prediction of finite alphabet sequences and data compression, whereas the section 2 discusses the generalization of that approach to realvalued series. Section 3 presents results of our experimental investigation of the described method and the section 4 contains a conclusion.
2 Forecasting of finite alphabet sequences
We begin the description of the proposed algorithm with an explanation of the relationship between the problems of data compression and prediction of sequences. Denote a finite set of symbols (an alphabet) as , and the set of all possible words (sequences) with length over alphabet as . A lossless code is a mapping , such that for any sequence of words , , the sequence can be uniquely decoded as . In the paper [5]
the following formula was proposed to obtain a probability distribution over
using a data compression method :(1) 
where , is the length of encoded representation of .
The estimation of the conditional probability that the next symbol
will be equal to in a sequence can be obtained by the following formula:(2) 
The strength of our approach is the ability to use a set of algorithms and "automatically" select the most accurate among them. Suppose that we have several data compression methods and each of them works well with a particular type of sequences. In this case, we can obtain a single method out of them which has almost the same accuracy as the most accurate one on each type of sequence. This can be done by using the following formula:
(3) 
where the sum of nonnegative weight coefficients is equal to .
We illustrate the workings of the algorithm using a simple example. Consider how to predict the next two values of the sequence (according to the pattern they are ). The length of the sequence is 17 characters. We in turn add all possible sequences of length two from the alphabet at the end of . As a result, we obtain four sequences of length 19. Then we compress the "extended" sequences encoded in ASCII using the zlib and ppmd compressors. The results are shown in table 1. In the second and third columns the sizes of compressed files are presented.
Sequence  zlib, bits  ppmd, bits  

144  120  0.0039  
144  120  0.0039  
128  112  0.9884  
136  120  0.0039 
For instance, let’s compute the probability of the "correct" next values by the formula 2. We use the equal weights :
We can obtain the probability that the next symbol will be summing up the probabilities of and (we got the sum of probabilities equals to due to rounding errors):
We can use the mean value as an onestep forecast.
3 Forecasting of realvalued sequences
The described algorithm can be used to forecast realvalued time series. Suppose we have a time series , and we want to predict the next values . The basic idea is to transform the original realvalued series to the finite alphabet sequence. Denote as an interval containing all , . We divide this interval into disjoint subintervals of equal length with numbers (we denote a subinterval with number as ). Then we can replace each value in the original time series with the corresponding subinterval number and obtain the new time series of subinterval numbers , . We make a forecast of subintervals containing the next values of the series using the series . We can obtain the probability that at the time , , the value of the series will fall into the subinterval with number from the marginal probability distribution of the subinterval numbers:
(4) 
Next we consider the problem of selecting the number of subintervals . This parameter has a great influence on the accuracy of the method. If is too small we may obtain low accuracy due to rounding. On the other hand, if is too large, then the noise in the data can decrease the accuracy. Moreover, with the growth of the computational complexity of the algorithm grows exponentially. In this paper we use the approach described in [12]. We partition an interval containing all values of a time series by subintervals, , . Then we make independently forecasts at each and combine this forecasts with weight coefficients. Denote as the term of the original time series and as the interval number corresponding to it in the partition into intervals. We can use all partitions with weight coefficients by formula 5:
(5) 
where is the alphabet of subinterval numbers and the nonnegative weights sum to .
In the calculations by formula 5 the probability distribution, obtained by the most compressible series, will dominate in the final result. Usually the series, obtained by partition into a smaller numbers of subintervals, are better compressible. For a more fair comparison of the lengths of codewords we add bits to the length of each word. Consider a simple example. Suppose that an interval that contains all values of the time series is partitioned into and subintervals (we denote them as partitions 1 and 2 correspondingly). Note that each subinterval in partition 1 corresponds to two subintervals in partition 2. To specify, in which subinterval of partition 2 falls a value from partition 1, a single bit is required (we can encode as 0 the lower half of the subinterval from partition 2 and as 1 the upper one). If there are values in the time series, bits are required for the whole series. And if the maximal partition has subintervals, we must use bits to specify to which subinterval of the maximal partition belongs a value from the partition into subintervals.
4 Experimental investigation
In this section, we present the results of forecasting data from the real world using the described method. The experiments were performed on two time series: the Tindex time series and the planetary Kindex (Kp) time series.
We used three data compression programs to obtain all presented forecasts: zlib ^{1}^{1}1https://zlib.net (version 1.2.11), ppmd ^{2}^{2}2https://github.com/Shelwien/ppmd_sh (an implementation of Prediction by Partial Matching data compression algorithm) and grammarbased compressor RePair [13].
To evaluate the accuracy of a forecast we used Mean Absolute Error (MAE):
where is the predicted value, and is the observed value.
In our method the number of sequences to compress depends exponentially on the forecasting horizon. In order to reduce that number we used a procedure which can be shown by a simple example. Consider a series . To make a forecast for the next four values we can split that series into two series and . Then, instead of making one forecast for four values ahead, we can make two forecasts, but just for two values ahead each. Using the first series we can predict and , and using the second series we can predict and . If we hadn’t used that technique we would have to compress sequences. The described approach reduces that number to the sequences and can be obviously generalized.
We begin with the description of the monthly Tindex time series forecasting. The Tindex is an indicator of the highest frequencies able to be retracted from regions in the ionosphere ^{3}^{3}3https://www.sws.bom.gov.au/HF_Systems/1/6. A collection of data can be found on the website of the Space Weather Services (SWS) of Australian Bureau of Meteorology follow the link http://listserver.ips.gov.au/mailman/listinfo/ipstindexpredictions. For each month since November 2000 a file containing all observed values since January 1938 until the previous month and forecasts for several years ahead is published. Our computations were carried out as follows. For each month since January 2011 to July 2017 we made our own forecasts for 18 values ahead using observed values until that month. Then, using the file 18 months later, we compared the MAE of our forecasts and the SWS forecasts. We used the following time series preprocessing techniques:

To remove the seasonal component from the series we used Seasonal Trend Decomposition (STL) [14]. The frequency of the seasonal component was equal to 11 years or 132 month (because the length of a solar cycle is typically 10 to 11 years);

We used smoothing function: ;

We split a time series to the 6 time series as described previously (so instead of making one forecast for 18 values ahead, we made 6 forecasts for 3 values);

We took a first difference, i.e. instead of forecasting a series we considered the series ;

Despite the fact that all values in this time series are integer, we considered it as a realvalued sequence and used the technique of partition to subintervals in order to reduce the size of the alphabet. The maximal number of dubintervals were 16 (so we considered partitions to 2, 4, 8 and 16 intervals). But when computing MAE, we rounded our forecasts to integers.
The results are presented in table 2. As we can see from table 2, our method is more accurate when forecasting one value ahead.
MAE for forecasting horizon  Average  
1  2  3  4  5  6  8  12  15  18  14  18  118  
SWS forecast  12.2  13.4  14.5  15.6  16.3  17.7  20.1  21.5  22.8  24.1  13.9  16.1  19.5 
zlib  11.3  13.9  15.8  16.3  15.6  18.6  20.4  26.8  27.0  38.1  14.3  16.21  20.8 
ppmd  11.2  14.2  16.7  17.9  17.2  19.0  24.2  26.9  31.5  28.3  15.0  18.0  22.8 
rp  13.5  17.0  22.4  26.9  23.8  17.3  28.8  24.7  33.5  36.8  20.0  22.6  28.9 
zlib + ppmd + rp  11.2  14.2  16.7  17.9  17.3  19.0  26.3  26.8  31.6  28.3  15.0  18.3  23.1 
Let us proceed to the description of Kpindex time series forecasting. The planetary Kindex is used to characterize the magnitude of geomagnetic storms ^{4}^{4}4https://www.swpc.noaa.gov/products/planetarykindex. It’s an integer value from range . The Space Weather Prediction Center (SWPC) https://www.swpc.noaa.gov/products/planetarykindex publishes 3hour Kindex data (8 values per day) and forecasts for three days ahead. We made 24steps forecasts for each day from 4 February 2018 to 28 March 2018 and compared the accuracy of our predictions with the accuracy of predictions made by SWPC. We used the following time series preprocessing techniques:

We split a time series to the 8 time series as described previously (so instead of making forecasts for 24 values ahead, we made 8 forecasts for 3 values ahead);
The results are summarized in table 3.
MAE for forecasting horizon  Average  
1  2  3  4  5  6  8  12  15  18  24  14  18  118  124  
SWPC forecast  1.11  1.26  0.98  1.06  1.09  1.11  1.30  1.11  1.25  1.40  1.21  1.10  1.16  1.16  1.13 
zlib  0.70  1.00  1.15  0.91  1.00  1.09  1.10  1.02  1.25  1.57  1.45  0.94  1.00  1.16  1.19 
ppmd  0.77  1.11  1.08  0.91  1.08  1.15  1.09  1.00  1.15  1.45  1.45  0.97  1.05  1.13  1.14 
rp  2.04  2.06  2.28  1.83  2.02  2.08  2.06  1.58  1.70  2.62  2.57  2.05  2.09  2.04  2.09 
zlib + ppmd + rp  0.77  1.11  1.08  0.91  1.08  1.15  1.09  1.00  1.15  1.45  1.45  0.97  1.04  1.27  1.16 
As can be seen from the table 3, the overall accuracy of the forecasts is very similar. As in the previous case, our method is more accurate when forecasting one value ahead.
5 Conclusion
In this paper we described how wellknown datacompression programs can be used to forecast time series. We showed that such technique is competitive with widelyused methods and, in our opinion, can be used in practice.
Acknowledgment
This work was supported by Russian Foundation for Basic Research (grant 182903005 ).
References
 [1] Box G.E., Jenkins G.M., Reinsel G.C., Ljung G.M. Time series analysis: forecasting and control. John Wiley & Sons; 2015 May 29.
 [2] Hyndman R, Koehler A.B., Ord J.K., Snyder R.D. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media; 2008 Jun 19.
 [3] Kaastra I, Boyd M. Designing a neural network for forecasting financial and economic time series. Neurocomputing. 1996 Apr 1;10(3):21536.
 [4] Zhang G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003 Jan 1;50:15975.
 [5] Ryabko B.Y. Prediction of random sequences and universal coding. Problems of information transmission. 1988 Apr;24(2):8796.
 [6] Cleary J, Witten I. Data compression using adaptive coding and partial string matching. IEEE transactions on Communications. 1984 Apr;32(4):396402.
 [7] Burrows M, Wheeler D.J. A blocksorting lossless data compression algorithm.
 [8] Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on information theory. 1977 May;23(3):33743.
 [9] Ziv J, Lempel A. Compression of individual sequences via variablerate coding. IEEE transactions on Information Theory. 1978 Sep;24(5):5306.
 [10] Kieffer J.C, Yang E.H. Grammarbased codes: a new class of universal lossless source codes. IEEE Transactions on Information Theory. 2000 May; 46(3):73754.
 [11] Ryabko, B., Astola, J., and Malyutov. M. Compressionbased methods of statistical analysis and prediction of time series. Switzerland: Springer International Publishing, 2016.
 [12] Ryabko B. Compressionbased methods for nonparametric prediction and estimation of some characteristics of time series // IEEE Transactions on Information Theory. – 2009. – T. 55. – №. 9. – P. 43094315.
 [13] Bille P., Gørtz I. L., Prezza N. SpaceEfficient RePair Compression // Data Compression Conference (DCC), 2017. – IEEE, 2017. – P. 171180.
 [14] Cleveland R.B., Cleveland W.S., McRae J.E., Terpenning I. STL: A seasonaltrend decomposition. Journal of official statistics. 1990;6(1):373.
Comments
There are no comments yet.