With the rapid growth of the Internet of Things (IoT) network and many other connected data sources, there is an enormous increase of time series data. Compared with traditional time series data, it comes in big volume and typically has a long seasonality period. One of the fundamental problems in managing and utilizing these time series data is the seasonal-trend decomposition. A good seasonal-trend decomposition can reveal the underlying insights of a time series, and can be useful in further analysis such as anomaly detection and forecasting [Hochenbaum, Vallis, and Kejariwal2017, Laptev, Amizadeh, and Flint2015, Hyndman and Khandakar2008, Wu et al.2016]. For example, in anomaly detection a local anomaly could be a spike during an idle period. Without seasonal-trend decomposition, it would be missed as its value is still much lower than the unusually high values during a busy period. In addition, different types of anomalies correspond to different patterns in different components after decomposition. Specifically, the spike & dip anomalies correspond to abrupt change of remainder, and the change of mean anomaly corresponds to abrupt change of trend. Similarly, revealing the trend of time series can help to build more robust forecasting models.
As the seasonal adjustment is a crucial step for time series where seasonal variation is observed, many seasonal-trend decomposition methods have been proposed. The most classical and widely used decomposition method is the STL (Seasonal-Trend decomposition using Loess) [Cleveland et al.1990]
. STL estimates the trend and the seasonality in an iterative way. However, STL still suffers from less flexibility when seasonality period is long and high noises are observed. In practice, it often fails to extract the seasonality component accurately when seasonality shift and fluctuation exist. Another direction of decomposition is X-13-ARIMA-SEATS[Bell and Hillmer1984] and its variants such as X-11-ARIMA, X-12-ARIMA [Hylleberg1998], which are popular in statistics and economics. These methods incorporate more features such as calendar effects, external regressors, and ARIMA extensions to make them more applicable and robust in real-world applications, especially in economics. However, these algorithms can only scale to small or medium size data. Specifically, they can handle monthly and quarterly data, which limits their usage in many areas where long seasonality period is observed. Recently, more decomposition algorithms have been proposed [Dokumentov, Hyndman, and others2015, Livera, Hyndman, and Snyder2011, Verbesselt et al.2010]. For example, in [Dokumentov, Hyndman, and others2015], a two-dimensional representation of the seasonality component is utilized to learn the slowly changing seasonality component. Unfortunately, it cannot scale to time series with long seasonality period due to the cost to learn the huge two-dimensional structure.
Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly. First of all, seasonality fluctuation and shift are quite common in real-world time series data. We take a time series data whose seasonality period is one day as an example. The seasonality component at 1:00 pm today may correspond to 12:30 pm yesterday, or 1:30 pm the day before yesterday. Secondly, most algorithms cannot handle the abrupt change of trend and remainder, which is crucial in anomaly detection for time series data where the abrupt change of trend corresponds to the change of mean anomaly and the abrupt change of residual corresponds to the spike & dip anomaly. Thirdly, most methods are not applicable to time series with long seasonality period, and some of them can only handle quarterly or monthly data. Let the length of seasonality period be , we need to estimate parameters to extract the seasonality component. However, in many IoT anomaly detection applications, the typical seasonality period is one day. If a data point is collected every one minute, then , which is not solvable by many existing methods.
In this paper, we propose a robust and generic seasonal-trend decomposition method. Compared with existing algorithms, our method can extract seasonality from data with long seasonality period and high noises accurately and effectively. In particular, it allows fractional, flexible, and shifted seasonality component over time. Abrupt changes in trend and remainder can also be handled properly. Specifically, we extract the trend component robustly by solving a regression problem using the least absolute deviations (LAD) loss [Wang, Li, and Jiang2007] with sparse regularizations. After the trend is extracted, we apply the non-local seasonal filtering to extract the seasonality component robustly. This process is iterated until accurate estimates of trend and seasonality components are extracted. As a result, the proposed seasonal-trend decomposition method is an ideal tool to extract insights from time series data for the purpose of anomaly detection. To validate the performance of our method, we compare our method with other state-of-the-art seasonal-trend decomposition algorithms on both synthetic and real-world time series data. Our experimental results show that our method can successfully extract different components on various complicated datasets while other algorithms fail.
Note that time series decomposition approaches can be either additive or multiplicative. In this paper we focus on the additive decomposition, and the multiplicative decomposition can be obtained similarly. The remainder of this paper is organized as follows: Section 2 briefly introduces the related work of seasonal-trend decomposition; In Section 3 we discuss our proposed seasonal-trend decomposition method in detail; the empirical studies are investigated in comparison with several state-of-the-art algorithms in Section 4; and we conclude the paper in Section 5.
As we discussed in Section 1, the “classical” method [Macaulay1931] continues to evolve from X-11, X-11-ARIMA to X-13-ARIMA-SEATS and improve their capabilities to handle seasonality with better robustness. A recent algorithm TBATS (Trigonometric Exponential Smoothing State Space model with Box-Cox transformation, ARMA errors, Trend and Seasonal Components) [Livera, Hyndman, and Snyder2011] is introduced to handle complex, non-integer seasonality. However, they all can be described by the state space model with a lot of hidden parameters when the period is long. They are suitable to process time series with short seasonality period (e.g., tens of points) but suffer from the high computational cost for time series with long seasonality period and they cannot handle slowly changing seasonality.
The Hodrick-Prescott filter [Hodrick and Prescott1997]
, which is similar to Ridge regression, was introduced to decompose slow-changing trend and fast-changing residual. While the formulation is simple and the computational cost is low, it cannot decompose trend and long-period seasonality. And it only regularizes the second derivative of the fitted curve for smoothness. This makes it prone to spike and dip and cannot catch up with the abrupt trend changes.
Based on the Hodrick-Prescott filter, STR (Seasonal-Trend decomposition procedure based on Regression) [Dokumentov, Hyndman, and others2015]
which explores the joint extraction of trend, seasonality and residual without iteration is proposed recently. STR is flexible to seasonal shift, and it can not only deal with multiple seasonalities but also provide confidence intervals for the predicted components. To get a better tolerance to spikes and dips, robust STR using
-norm regularization is proposed. The robust STR works well with outliers. But still, as the authors mentioned in the conclusion, STR and robust STR only regularize the second difference of fitted trend curve for smoothness so it cannot follow abrupt change on the trend.
is a model-free approach that performs well on short time series. It transforms a time series into a group of sliding arrays, folds them into a matrix and then performs SVD decomposition on this matrix. After that, it picks the major components to reconstruct the time series components. It is very similar to principal component analysis, but its strong assumption makes it not applicable on some real-world datasets.
As a summary, when comparing different time series decomposition methods, we usually consider their ability in the following aspects: outlier robustness, seasonality shift, long period in seasonality, and abrupt trend change. The complete comparison of different algorithms is shown in Table 1.
Robust STL Decomposition
Similar to STL, we consider the following time series model with trend and seasonality
where denotes the observation at time , is the trend in time series, is the seasonal signal with period , and the denotes the remainder signal. In seasonal-trend decomposition, seasonality typically describes periodic patterns which fluctuates near a baseline, and trend describes the continuous increase or decrease. Thus, usually it is assumed that the seasonal component has a repeated pattern which changes slowly or even stays constant over time, whereas the trend component is considered to change faster than the seasonal component [Dokumentov, Hyndman, and others2015]. Also note that in this decomposition we assume the remainder
contains all signals other than trend and seasonality. Thus, in most cases it contains more than white noise as usually assumed. As one of our interests is anomaly detection, we assumecan be further decomposed into two terms:
where denotes spike or dip, and denotes the white noise.
In the following discussion, we present our RobustSTL algorithm to decompose time series which takes the aforementioned challenges into account. The RobustSTL algorithm can be divided into four steps: 1) Denoise time series by applying bilateral filtering [Paris, Kornprobst, and Tumblin2009]; 2) Extract trend robustly by solving a LAD regression with sparse regularizations; 3) Calculate the seasonality component by applying a non-local seasonal filtering to overcome seasonality fluctuation and shift; 4) Adjust extracted components. These steps are repeated until convergence.
In real-world applications when time series are collected, the observations may be contaminated by various types of errors or noises. In order to extract trend and seasonality components robustly from the raw data, noise removal is indispensable. Commonly used denoising techniques include low-pass filtering [Baxter and King1999, Christiano and Fitzgerald2003], moving/median average [Osborn1995, Wen and Zeng1999], and Gaussian filter [Blinchikoff and Krause2001]. Unfortunately, those filtering/smoothing techniques destruct some underlying structure in and in noise removal. For example, Gaussian filter destructs the abrupt change of , which may lead to a missed detection in anomaly detection.
Here we adopt bilateral filtering [Paris, Kornprobst, and Tumblin2009] to remove noise, which is an edge-preserving filter in image processing. The basic idea of bilateral filtering is to use neighbors with similar values to smooth the time series . When applied to time series data, the abrupt change of trend , and spike & dip in can be fully preserved.
Formally, we use to denote the filtered time series after applying bilateral filtering:
where denotes the filter window with length , and the filter weights are given by two Gaussian functions as
where is a normalization factor, and are two parameters which control how smooth the output time series will be.
After denoising, the decomposition model in Eq. (1) is updated as
where the is the filtered noise.
The joint learning of and in Eq. (5) is challenging. As the seasonality component is assumed to change slowly, we first perform seasonal difference operation for the denoised signal to mitigate the seasonal effects, i.e.,
where is the seasonal difference operation, and is the first order difference operation.
Note that in Eq. (7), dominates as we assume the seasonality difference operator on and leads to significantly smaller values. Thus, we propose to recover the first order difference of trend signal from by minimizing the following weighted sum objective function
instead of the commonly used sum-of-squares loss function due to the its well-known robustness to outliers. Note that here we assume that. This assumption may not be true in the beginning for some , but since the proposed framework employs an alternating algorithm to update trend and seasonality iteratively, later we can remove from to make it hold. The second and third terms are first-order and second-order difference operator constraints for the trend component , respectively. The second term assumes that the trend difference usually changes slowly but can also exhibit some abrupt level shifts; the third term assumes that the trends are smooth and piecewise linear such that are sparse [Kim et al.2009]. Thus, we expect the trend component can capture both abrupt level shift and gradual change.
The objective function (8) can be rewritten in an equivalent matrix form as
where denotes the
-norm of the vector, and are the corresponding vector forms as
and are and Toeplitz matrix, respectively, with the following forms
To facilitate the process of solving the above optimization problem, we further formulate the three -norms in Eq. (9) as a single -norm, i.e.,
where the matrix and vector are
The minimization of the single -norm in Eq. (12
) is equivalent to the linear program as follows
where is auxiliary vector variable. Let denote the output of the above optimization is with , and further assume (which will be be estimated later). Then, we can get the relative trend output based on as
Once obtaining the relative trend from the denoised time series, the decomposition model is updated as
After removing the relative trend component, can be considered as a “contaminated seasonality”. In addition, seasonality shift makes the estimation of even more difficult. Traditional seasonality extraction methods only considers a subsequence (or associated sequences) to estimate , where is the period length. However, this approach fails when seasonality shift happens.
In order to accurately estimate the seasonality component, we propose a non-local seasonal filtering. In the non-local seasonal filtering, instead of only considering , we consider neighborhoods centered at , respectively. Specifically, for , its neighborhood consists of neighbors . Furthermore, we model the seasonality component as a weighted linear combination of where is in the neighborhood defined above. The weight between and depends not only in how far they are in the time dimension (i.e., the difference of their indices and ), but also depends on how close and are. Intuitively, the points in the neighborhood with similar seasonality to will be given a larger weight. In this way, we automatically find the points with most similar seasonality and solve the seasonality shift problem. In addition, abnormal points will be given smaller weights in our definition and makes the non-local seasonal filtering robust to outliers.
Mathematically, the operation of seasonality extraction by non-local seasonal filtering is formulated as
where the and are defined as
by considering previous seasonal neighborhoods where each neighborhood contains points.
To illustrate the robustness of our non-local seasonal filtering to outliers, we give an example in Figure 1(a). Whether there is outlier at current time point (dip), or at historical time point around (spike), the filtered output (red curve) would not be affected as show in Figure 1(a). Figure 1(b) illustrates why seasonality shift is overcome by the non-local seasonal filtering. As shown in the red curve in Figure 1(b), when there is shift in the season pattern, as long as the previous seasonal neighborhoods used in the non-local seasonal filter satisfy , the extracted season can follow this shift.
After removing the season signal, the remainder signal is
In order to make the seasonal-trend decomposition unique, we need to ensure that all seasonality components in a period sums to zero, i.e., . To this end, we adjust the obtained seasonality component from Eq. (17) by removing its mean value, which also corresponds to the estimation of the trend point . Formally, we have
Therefore, the estimates of trend and seasonal components are updated as follows:
And the estimate of remainder can be obtained as
Furthermore, we can derive different components of the estimated , i.e.,
Eq. (24) indicates that the remainder signal may contain residual seasonal component and trend component . Also in the trend estimation step, we assume can be approximated using . Similar to alternating algorithm, after we obtained better estimates of , , and , we can repeat the above four steps to get more accurate estimates of the trend, seasonality, and remainder components. Formally, the procedure of the RobustSTL algorithm is summarized in Algorithm 1.
We conduct experiments to demonstrate the effectiveness of the proposed RobustSTL algorithm on both synthetic and real-world datasets.
We use the following three state-of-the-art baseline algorithms for comparison purpose:
Standard STL: It decomposes the signal into seasonality, trend, and remainder based on Loess in an iterative manner.
TBATS: It decomposes the signal into trend, level, seasonality, and remainder. The trend and level are jointly together to represent the real trend.
STR: It assumes the continuity in both trend and seasonality signal.
To test the above three algorithms, we use R functions stl, tbats, AutoSTR from R packages forecast111https://cran.r-project.org/web/packages/forecast/index.html and stR222https://cran.r-project.org/web/packages/stR/index.html. We implement our own RobustSTL algorithm in Python, where the linear program (see Eqs. (12) and (13)) in trend extraction is solved using CVXOPT -norm approximation333https://cvxopt.org/examples/mlbook/l1.html.
Experiments on Synthetic Data
To generate the synthetic dataset, we incorporate complex season/trend shifts, anomalies, and noise to simulate real-world scenarios as shown in Figure 2. We first generate seasonal signal using a square wave with minor random seasonal shifts in the horizontal axis. The seasonal period is set to 50 and a total of 15 periods are generated. Then, we add trend signal with 10 random abrupt level changes and 14 spikes and dips as anomalies. The noise is added by zero mean Gaussian with variance.
For the three baseline algorithms STL, TBATS, and STR, their parameters are optimized using cross-validation. For our proposed RobustSTL algorithm, we set the regularization coefficients to control the signal smoothness in the trend extraction, and set the neighborhood parameters in the seasonality extraction to handle the seasonality shift.
Figure 3 shows the decomposition results for four algorithms. The standard STL is affected by anomalies leading to rough patterns in the seasonality component. The trend component also tends to be smooth and loses the abrupt patterns. TBATS is able to respond to the abrupt changes and level shifts in trend, however, the trend is affected by the anomaly signals. It also obtains almost the same seasonality pattern for all periods. Meanwhile, STR does not assume strong repeated patterns for seasonality and is robust to outliers. However, it cannot handle the abrupt changes of trend. By contrast, the proposed RobustSTL algorithm is able to separate trend, seasonality, and remainders successfully, which are all very close to the original synthetic signals.
To evaluate the performance quantitatively, we also compare mean squared error (MSE) and mean absolute error (MAE) between the true trend/season in the synthetic dataset and the extracted trend/season from four decomposition algorithms, which is summarized in Table 2. It can be observed that our RobustSTL algorithm achieves much better results than STL, STR, and TBATS algorithms.
Experiments on Real-World Data
The real-world datasets to be evaluated include two time series. One is the supermarket and grocery stores turnover from 2000 to 2009 [Alexandrov et al.2012], which has a total of 120 observations with the period . We apply the log transform on the data and inject the trend changes and anomalies to demonstrate the robustness (suggested from [Alexandrov et al.2012]). The input data can be seen in the top subplot of Figure 4 (a). We denote it as real dataset 1
. Another time series is the file exchange count number in a computer cluster, which has a total of 4032 observations with the period being 288. We apply the linear transform to convert the data to the range offor the purpose of data anonymization. The data can be seen in the top subplot of Figure 5 (a). We denote it as real dataset 2.
For all four algorithms, period parameter is used for real-world dataset 1, and is used for dataset 2. The rest parameters are optimized using cross-validation for the standard STL, TBATS, and STR. For our proposed RobustSTL algorithm, we set the neighbour window parameters , the regularization coefficients for real data 1, and for real data 2. Notice that as STR is not scalable to time series with long seasonality period, we do not report the decomposition result of dataset 2 for STR.
Figure 4 and Figure 5 show the decomposition results on both real-world datasets. RobustSTL typically extracts smooth seasonal signals on real-world data. The seasonal component can adapt to the changing patterns and the seasonality shifts, as seen in Figure 4 (a). The extracted trend signal recovers the abrupt changes and level shifts promptly and is robust to the existence of anomalies. The spike anomaly is well preserved in the remainder signal. For the standard STL and STR, the trend does not follow the abrupt change and the seasonal component is highly affected by the level shifts and spike anomalies in the original signal. TBATS can decompose the trend signal that follows the abrupt change, however, the trend is affected by the spike anomalies.
During the experiment, it is also observed that the computation speed of RobustSTL is significantly faster than TBATS and STR (note the result for STR is not available in Figure 5 due to its long computation time), as the algorithm can be formulated as an optimization problem with -norm regularization and solved efficiently. While the standard STL seems computational efficient, its sensitivity to anomalies and incapability to capture the trend change and level shifts make it difficult to be used on enormous real-world complex time series data.
Based on the decomposition results from both the synthetic and the real-world datasets, we conclude that the proposed RobustSTL algorithm outperforms existing solutions in terms of handling abrupt and level changes in the trend signal, the irregular seasonality component and the seasonality shifts, the spike & dip anomalies effectively and efficiently.
In this paper, we focus on how to decompose complex long time series into trend, seasonality, and remainder components with the capability to handle the existence of anomalies, respond promptly to the abrupt trend changes shifts, deal with seasonality shifts & fluctuations and noises, and compute efficiently for long time series. We propose a robustSTL algorithm using LAD with -norm regularizations and non-local seasonal filtering to address the aforementioned challenges. Experimental results on both synthetic data and real-world data have demonstrated the effectiveness and especially the practical usefulness of our algorithm. In the future we will work on how to integrate the seasonal-trend decomposition with anomaly detection directly to provide more robust and accurate detection results on various complicated time series data.
- [Alexandrov et al.2012] Alexandrov, T.; Bianconcini, S.; Dagum, E. B.; Maass, P.; and McElroy, T. S. 2012. A review of some modern approaches to the problem of trend extraction. Econometric Reviews 31(6):593–624.
- [Baxter and King1999] Baxter, M., and King, R. G. 1999. Measuring business cycles: approximate band-pass filters for economic time series. Review of Economics and Statistics 81(4):575–593.
- [Bell and Hillmer1984] Bell, W. R., and Hillmer, S. C. 1984. Issues involved with the seasonal adjustment of economic time series. Journal of Business & Economic Statistics 2(4):291–320.
- [Blinchikoff and Krause2001] Blinchikoff, H., and Krause, H. 2001. Filtering in the time and frequency domains. Atlanta, GA: Noble Pub.
- [Christiano and Fitzgerald2003] Christiano, L. J., and Fitzgerald, T. J. 2003. The band pass filter. International Economic Review 44(2):435–465.
- [Cleveland et al.1990] Cleveland, R. B.; Cleveland, W. S.; McRae, J. E.; and Terpenning, I. 1990. STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics 6(1):3–73.
- [Danilov1997] Danilov, D. 1997. The caterpillar method for time series forecasting. Principal Components of Time Series: The Caterpillar Method 73–104.
- [Dokumentov, Hyndman, and others2015] Dokumentov, A.; Hyndman, R. J.; et al. 2015. STR: A seasonal-trend decomposition procedure based on regression. Technical report, Monash University, Department of Econometrics and Business Statistics.
- [Golyandina, Nekrutkin, and Zhigljavsky2001] Golyandina, N.; Nekrutkin, V.; and Zhigljavsky, A. A. 2001. Analysis of time series structure: SSA and related techniques. Boca Raton, Florida: Chapman and Hall/CRC.
- [Hochenbaum, Vallis, and Kejariwal2017] Hochenbaum, J.; Vallis, O. S.; and Kejariwal, A. 2017. Automatic anomaly detection in the cloud via statistical learning. CoRR abs/1704.07706.
- [Hodrick and Prescott1997] Hodrick, R. J., and Prescott, E. C. 1997. Postwar US business cycles: An empirical investigation. Journal of Money, credit, and Banking 29(1):1–16.
- [Hylleberg1998] Hylleberg, S. 1998. New capabilities and methods of the x-12-arima seasonal-adjustment program: Comment. Journal of Business & Economic Statistics 16(2):167–68.
- [Hyndman and Khandakar2008] Hyndman, R. J., and Khandakar, Y. 2008. Automatic time series forecasting: the forecast package for R. Journal of Statistical Software 26(3):1–22.
- [Kim et al.2009] Kim, S.-J.; Koh, K.; Boyd, S.; and Gorinevsky, D. 2009. trend filtering. SIAM Review 51(2):339–360.
- [Laptev, Amizadeh, and Flint2015] Laptev, N.; Amizadeh, S.; and Flint, I. 2015. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, 1939–1947. ACM.
- [Livera, Hyndman, and Snyder2011] Livera, A. M. D.; Hyndman, R. J.; and Snyder, R. D. 2011. Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American Statistical Association 106(496):1513–1527.
- [Macaulay1931] Macaulay, F. R. 1931. Introduction to ”The Smoothing of Time Series”. NBER. 17–30.
- [Osborn1995] Osborn, D. R. 1995. Moving average detrending and the analysis of business cycles. Oxford Bulletin of Economics and Statistics 57(4):547–558.
- [Paris, Kornprobst, and Tumblin2009] Paris, S.; Kornprobst, P.; and Tumblin, J. 2009. Bilateral Filtering. Hanover, MA: Now Publishers Inc.
- [Verbesselt et al.2010] Verbesselt, J.; Hyndman, R.; Newnham, G.; and Culvenor, D. 2010. Detecting trend and seasonal changes in satellite image time series. Remote sensing of Environment 114(1):106–115.
- [Wang, Li, and Jiang2007] Wang, H.; Li, G.; and Jiang, G. 2007. Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics 25(3):347–355.
[Wen and Zeng1999]
Wen, Y., and Zeng, B.
A simple nonlinear filter for economic time series analysis.Economics Letters 64(2):151–160.
[Wu et al.2016]
Wu, B.; Mei, T.; Cheng, W.-H.; and Zhang, Y.
Unfolding temporal dynamics: Predicting social media popularity using
multi-scale temporal decomposition.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, 272–278. AAAI Press.