Description: Develop functional singular spectrum analysis tools and plots
In this paper, we introduce a new extension of the Singular Spectrum Analysis (SSA) called functional SSA to analyze functional time series. The new methodology is developed by integrating ideas from functional data analysis and univariate SSA. We explore the advantages of the functional SSA in terms of simulation results and with an application to a call center data. We compare the proposed approach with Multivariate SSA (MSSA) and Functional Principal Component Analysis (FPCA). The results suggest that further improvement to MSSA is possible and the new method provides an attractive alternative to the novel extensions of the FPCA for correlated functions. We have also developed an efficient and user-friendly R package and a shiny web application to allow interactive exploration of the results.READ FULL TEXT VIEW PDF
Description: Develop functional singular spectrum analysis tools and plots
One of the popular approaches in the decomposition of time series is accomplished using the rates of change. In this approach, the observed time series is partitioned (decomposed) into informative trends plus potential seasonal (cyclical) and noise (irregular) components. Aligned with this principle, Singular Spectrum Analysis (SSA) is a model-free procedure that is commonly used as a nonparametric technique in analyzing the time series. SSA is intrinsically motivated as an exploratory and model building tool rather than a confirmatory procedure (Golyandina et al., 2001). SSA does not require restrictive assumptions such as stationarity, linearity, and normality. It can be used for a wide range of purposes such as trend and periodic component detection and extraction, smoothing, forecasting, change-point detection, gap filling, causality and so on; (see, e.g. Golyandina et al., 2001; Moskvina and Zhigljavsky, 2003; Kondrashov et al., 2010; Golyandina and Osipov, 2007; Mohammad and Nishida, 2011; Mahmoudvand and Rodrigues, 2016; Rodrigues and Mahmoudvand, 2016).
The implementation of SSA over time series is similar to that of Principal Components Analysis (PCA) of multivariate data. In contrast to PCA, which is applied to a data matrix with independent rows, SSA is applied to a time series. It provides a representation of the given time series in terms of eigenvalues and eigenvectors of a so-called trajectory matrix(Alexandrov, 2009).
Up to this day, many studies have been published with extensions and applications of SSA. Extensions to a multivariate model as well as to a two-dimensional setting can be found, e.g., in Golyandina and Zhigljavsky (2013); Golyandina et al. (2018); Hassani and Mahmoudvand (2018)
and references therein. In the regular SSA, we assume that the observation at each time point is scalar, vector or array. As a matter of interest, one may consider a series of curves observed over time, and use the basics of Hilbert space in the functional data analysis (FDA) framework to introduce the concept of functional SSA (FSSA).
While the research in FDA has grown extensively in recent years, there have been relatively few contributions dealing with functional time series (FTS); see, e.g., Hörmann and Kokoszka (2012) and Bosq (2000). Most of the current FTS approaches focus on a parametric fit for inferences and forecasting. However, it would be of interest to non-parametrically decompose a FTS to reveal the respective trends plus seasonal and irregular components in an appropriate manner. Consistent with this approach, and as a first step, Fraiman et al. (2014) introduced a new concept of trends for the FTS and developed a nonparametric procedure to test the existence of a trend. Further, Hörmann et al. (2018) considered the periodic components for the FTS and derived several procedures to test the periodicity using frequency domain analysis. To the best of our knowledge, existing studies mainly focus on detecting rather than extracting interpretable components.
Since one of the primary missions of SSA is to extract trends and periodic components of a regular (non-functional) time series, it would be rational to establish a similar elegant nonparametric procedure to extract such components in FTS. In this paper we use the basics of SSA and multivariate functional PCA (MFPCA), introduced in Happ and Greven (2016); Chiou et al. (2014), to develop FSSA. In a nutshell, the core of SSA is to use PCA on the variables being lagged versions of a single time series. Since a lagged vectors of FTS forms a multivariate functional variable, we use the theory of MFPCA to develop the FSSA procedure. The new methodology, FSSA, not only can serve as a nonparametric dimension reduction tool to decompose the functional time series; it can also be used as a visualization tool to illustrate the concept of seasonality and periodicity in the functional space over time.
In order to depict the idea of our approach and to show its utility, consider the following motivating example involving a real dataset which is described in detail in subsection 5.2. This data provides the intraday number of calls to a call center, during different times of the days for one year. The associated curves is represented in an overlapping pattern in Figure 1 (left). In Figure 1 (right) we investigate a clustering pattern among weekdays and weekend days. As we can see, the intraday patterns of weekends (Friday and Saturday) are significantly different from workdays while workdays seem to have similar patterns. Investigators used variants of FPCA to analyze the call center data in literature (Shen and Huang, 2005; Huang et al., 2008; Maadooliat et al., 2015). For illustration purpose, we compare the results of the proposed method (FSSA) and FPCA for this clustering task. Figure 2(top) presents the projection of the data into the first four FPCs obtained from the fda package in R (Ramsay et al., 2018). We used seven different colors to differentiate between different days of a week. As one may observe, visually, there is no clear separation in either one of the FPC graphs in the top row. Although this may not be surprising, as the purpose of PCA of any type is to reduce the dimensionality, and not necessary decompose the data into regular trends, periodic and irregular components. In contrast, the grouping results that we obtained using the FSSA on the call center data are given in Figure 2(bottom). It can be seen that the functional behavior of seven days of a week can be well-distinguished, visually, using either one of the last two groups (groups 3 and 4).
The rest of the paper is organized as follows. Section 2 reviews the core of SSA for completeness. Section 3 presents the theoretical foundations and some properties of the proposed method (FSSA), and Section 4 provides implementation details. Section 5.1 reports simulation results to illustrate the use of the proposed approach in analyzing FTS, and to compare it with MSSA and FPCA. Application to a real data example on the number of calls that a call center received is given in Section 5.2. Section 6 provides some discussions and concluding remarks.
As we mentioned in Section 1, SSA can be used for many purposes. However, as we intend to introduce the functional version of SSA for decomposing FTS, we review a general scheme of SSA to perform time series decomposition in this section.
Throughout this section, we consider ’s are elements of Euclidean space . Suppose that is a realization of size from a time series. The basic SSA algorithm consists of four steps: Embedding, Decomposition, Grouping, and Reconstruction.
This step generates a multivariate object by tracking a moving window of size over the original time series, where L is called window length parameter and . Embedding can be regarded as a mapping operator that transfers the series into a so-called trajectory matrix of dimension , defines by
where and , for , are called lagged vectors. Note that the trajectory matrix , is a Hankel matrix, which means that all the elements along the anti-diagonals are equal. Indeed, the embedding operator is a one-to-one mapping from into , where is the set of all Hankel matrices.
In this step, the singular value decomposition (SVD) for the trajectory matrix is computed as:
where is the singular value of and are the associated (orthonormal) left and right singular vectors, and is called the respective elementary matrix. Note that is an eigenvector of corresponding to the eigenvalue . Moreover, it yields that
where, in this section,
denotes the outer (tensor) product of two vectors.
Consider a partition of the set of indices , where is the rank of the matrix , into disjoint subsets . For any positive integer , i.e. , the matrix is defined as . Thus, by the expansion (2) we have the grouped matrix decomposition
Each group in (4) should correspond to a component in time series decomposition. These components can be considered as trend, cycle, seasonal, noise, etc.
Finally, the resulting matrices in (4), are transformed back into the form of the original series by an inverse operator . In order to do this, first, it is necessary that each matrix to be approximated by a matrix in . This approximation is performed optimally in the sense of orthogonal projection of on with respect to the Frobenius norm. Denote this projection by . It is shown that the projection is the averaging of the matrix elements over the antidiagonals . By combining the results of this step and (4), we obtain the final decomposition of the series in the form of
The above algorithm can be extended to perform Multivariate SSA (MSSA) for analyzing multivariate time series. The only difference is in defining the trajectory matrix which can be defined by stacking univariate trajectory matrices horizontally or vertically (Hassani and Mahmoudvand, 2018).
It is well known that SSA does not require restrictive assumptions; however, it is ideal to have a time series with separable components. Therefore, we present tools to measure the separability of components in the next subsection.
Let , for be two time series and consider an additive model as . The series and are called separable when each lagged vector of is orthogonal to the lagged vectors of . To measure the degree of separability between two time series and , Golyandina et al. (2001) introduced the so-called w-correlation
where, for and . We call two series and , are w-orthogonal if for approriate values of (see the next subsection for more details). Note that , , is the reconstructed component produced by the group , and the matrix of is called w-correlation matrix.
There are two basic parameters in SSA procedure; window length () and grouping parameters. Choosing improper values for these parameters yields an incomplete reconstruction and misleading results in subsequent analysis. In spite of the importance of choosing and grouping parameters for SSA, no ideal solution has been yet proposed. A thorough review of the problem shows that the optimal choice of the parameters depends on the intrinsic structure of the data and the purposes of the study (Golyandina et al., 2001; Golyandina and Zhigljavsky, 2013). However, there are several recommendations and rules that work well for a wide range of scenarios. It is recommended to select the window length parameter, , to be a large integer that is multiple of the periodicities of the time series, but not larger than .
In addition, there are several utilities for effective grouping. These tools include analyzing the periodogram, paired plot of the singular vectors, scree plot of the singular values, and w-correlation plot; see Golyandina et al. (2001) for more details.
We start this section with the mathematical foundations that are used to develop the functional SSA procedure. From hereafter, we consider is a FTS of length . This means that each elements belongs to , the space of square integrable real functions defined on the interval . Here, the space is a Hilbert space, equipped with inner product . For a given positive integer , the space denotes the Cartesian product of copies of ; i.e. for an element , it has the form , where , and . Then is a Hilbert space equipped with the inner product . The norms will be denoted by and in the spaces and , respectively. Given , then the tensor(outer) product corresponds to the operator , is given by (.
For positive integers , we denote as the linear space spanned by operators , specified by where
We call an operator Hankel if , for some , where . The space of such Hankel operators will be denoted . For two given operators and in , define
It follows immediately that , defines an inner product on . We will call it Frobenius inner product of two operators in . The associated Frobenius norm is . Before discussing the FSSA algorithm, here we present a lemma that will be used in the last step of the proposed algorithm. Note that the Proofs for all lemmas, theorems and propositions are given in the supplementary materials.
Let be elements of the Hilbert space . If , then
for all .
For an integer , let and define a set of multivariate functional vectors in by
where ’s denote the functional lagged vectors. The following algorithm provides the FSSA results in four steps.
Define the operator with
We call the trajectory operator. It is easy to see that , where is an operator from to . Evaluating at a given point is same as the matrix product , where is an Hankel matrix given by
The operator is a bounded linear operator. If we define , given by
then is an adjoint operator for .
Define the operator by . Therefore, for given it implies that
Here, the operator can be also considered as an matrix with the operator entries , given by , where . Note that, the operator defines an integral operator on , associated to the kernel
Let us define to be a kernel matrix with the elements . Note that . It is easy to show that the associated integral operator of is , i.e,
The operator defined in (12) is a linear, self-adjoint, positive definite, bounded, continuous and compact operator.
Furthermore, using the Spectral Theorem (e.g. Werner (2006), Thm. VI.3.2.) implies
Since the kernel is continuous, it admits the expansion
This result is known as multivariate Mercer’s Theorem, (see e.g. Happ and Greven (2016) Prop. 3). For any positive , define an operator , given by
We call an elementary operator. Note that . Evaluating at a given point is equivalent to the matrix product , where is an matrix given by
Note that, ’s can be considered as functional extension of the elementary matrices defined in (3), where is projecting columns of into a spaced spanned by .
The elementary operators ’s are bounded operators of rank one. Furthermore ’s decompose the trajectory operator as
The grouping step is the procedure of rearranging and partitioning the elementary operators ’s in (20). To do this, we mimic the approaches used in step 3 of Section 2 for the univariate SSA and implement the equivalent functional version of those in Haghbin and Najibi (2019). Note that, in practice, we use a finite set of elementary operators, and one can consider a partition of the set of indices such that we have the expansion
At this step, for any given (), we would like to use to transform back each operator in (21) to , and hence construct a functional version of the decomposition given in (5). But since , first it is necessary to project to . Note that is a closed subspace of , therefore by Projection Theorem, there exist a unique such that
To specify , we denote the elements of and by and , respectively. Using Lemma 3.1, it is easy to extend the diagonal averaging approach given by Golyandina et al. (2001) to and obtain ’s as following:
where and stands for the number of pairs such that . Denote this projection by , and set . Now we can define , and reconstruct the functional time series.
Let , where , , are FTS. Using a fixed window length , for each series , denote as a sequence of functional lagged vectors, and as the linear space spanned by . Analogous to univariate SSA, separability of the series and is equivalent to , which is same as , for all . Furthermore, a necessary condition for separability can be defined based on w-correlation measure. To do this, consider the weighted inner product of two series and as
where . We call the series and w-orthogonal if
If the series and are separable, then they are w-orthogonal.
In practice, functional data are being recorded discretely and then converted to functional objects using proper smoothing techniques. We refer to Ramsay and Silverman (2007) for more details on preprocessing the raw data. Let be a known basis system (not necessarily orthogonal) of the space . Each functional observation in can be projected into subspace , where can be determined by variety of techniques (e.g. cross-validation). Therefore, each is uniquely represented by
Let us define quotient sequence, , and reminder sequence, , by
Note that for any given (), one may use (26) to determine and uniquely, so these sequences are well defined. Now, consider the objects , as a vector of length with all coordinates are zero except -th, which is .
The sequence is a basis system for , where is the Cartesian product of copies of .
Using Lemma 4.1, each element admits a unique representation
where corresponds to , belongs to , and is the dual basis of . Note that, in the special case, when ’s are orthonormal (so ’s are), (see Christensen, 1995, for more details). Applying the linear operator , defined in (12), on (27) implies
where . We call the corresponding matrix of .
Let be a functional object in , then if and only if .
Suppose the Gram matrix . Then the following holds:
is a symmetric matrix.
Now we have the recipes to proceed with the following algorithm and obtain the eigenfunctions of, ’s, used in the decomposition step. For a given set of basis , and a FTS, :
Use Theorem 4.1 to compute the matrices , and .
Use the eigendecomposition of to obtain eigenpairs for .
Now, one can use ’s to decompose the FTS to elementary operators ’s. Note that, in practice, we represent the elementary operators in matrix form. Therefore, one may observe that the equivalent functional elementary matrix , given in (3.1), is just a projection of the functional lagged vectors , given in (8), onto . Furthermore, in the diagonal averaging step, we can incorporate the averaging over the associate basis coefficients of in (22) to obtain the respective basis coefficients for . For more details see Haghbin and Najibi (2019).
In this section, first, we present a simulation study to elaborate the use of the FSSA compared with FPCA and MSSA under different scenarios. To do so, we utilize the implementation of the proposed model that is available as an R package named Rfssa in the CRAN repository (Haghbin and Najibi, 2019). We also use the fda (Ramsay et al., 2018) and Rssa (Golyandina et al., 2015) packages to obtain the FPCA and MSSA results. In the second subsection, we analyze the call center data using Rfssa and provide some visualization tools that come handy in the grouping step.
We also developed a shiny app, which is available at https://fssa.shinyapps.io/fssa/, to demonstrate and reproduce different aspects of the simulation setup. Furthermore, it can be used to compare the results of FPCA, MSSA, and FSSA on the call center dataset or any other FTS, provided by the end-user.
For the simulation setup, consider the functional time series of lengths and which are observed in fixed equidistant discrete points on from the following model:
A cubic B-spline basis functions with 15 degrees of freedom is used to convert’s into smooth (continuous) functional curves. In this model, is considered to be a periodic component defined as
The in (29) is a stochastic term that is generated under four different settings with an increasing trend in complexity. In the first setting, we consider. It is expected to obtain an acceptable performance from FPCA for reconstructing the FTS in the first setting as intuitively FPCA outperforms under this ideal framework (see Maadooliat et al., 2015, for more details).
In the remaining three settings, the
processes are simulated from a functional autoregressive model of order 1,, defined by
where is an integral operator with a parabolic kernel as follow
The constant is chosen such that the Hilbert-Schmidt norm defined by
acquires the values , and , for the remaining three settings, respectively. In these settings, the white noise terms are considered as independent trajectories of the standard Brownian motion over the interval . It is worth to note that as we increase the Hilbert-Schmidt norm, , in the FAR(1) models, the dependency structure of consecutive FTS gets more twisted, and we expect it would be more challenging to reconstruct the true structures, .
To compare the performance of FSSA and MSSA, we further consider three window length parameters ( and ) in our simulation setup. For the sake of consistency in all of the reconstruction procedures (FPCA, MSSA and FSSA), we use the first two leading eigen-components. As a measure of goodness of fit, we use the Root Mean Square Error (RMSE) defined as:
where is the FTS reconstructed by each method. We repeat each setting times and report the mean of the RMSE’s in Table 1.
By comparing the results in Table 1, it can be seen that FSSA outperform FPCA in different scenarios. This may not be surprising, as the main task of FPCA is dimension reduction, and it is not expected to perform exceptionally well in the reconstruction procedure, especially in the presence of complex noise structures. Except for the first setting, MSSA also outperforms FPCA significantly. Furthermore, FSSA performs better than MSSA in most of the cases except the case where the length of the FTS is small () and the window size, , is getting closer to . However, it is clear that FSSA is the optimal method for reconstructing the longer FTS ().
For all methods, RMSE decreases as the length of the series increases. For two smaller frequencies (), the average of RMSE increases as the noise structure becomes more complex in settings 1 through 4 while it decreases for . This might be happening due to the unpredicted cross-correlation of the functional noise structures and the periodic form of FTS.
The efficiency of MSSA and FSSA for different window lengths (), time series lengths () and frequencies (), the ratio of RMSE of MSSA to FSSA is examined in Figure 4. The overall pattern confirms the improvement in RMSE for FSSA as the time series get longer (larger ). Overall, as is increasing, the pattern of ratio of RMSE’s remains unchanged. Although as the window length becomes larger, either the improvement diminishes for longer FTS, or disappears (or reverses) for smaller . It is also worth to note that in setting 1 (GWN), based on the right panel of Figure 4 and Table 1, the FSSA dominates the other two methods in all combinations of parameters with a better efficiency scale.
|Setting 1 GWN||0.00||50||0.018||0.026||0.022||0.019||0.010||0.011||0.014|
To illustrate the advantages of FSSA, especially its main capability in extracting different functional components (i.e. trend, harmonic and noise), we explore the call center dataset analyzed in Maadooliat et al. (2015). This dataset provides the number of calls to a call center per 6 minutes intervals, between January 1 through December 31, 1999. Suppose , is the square root of number of calls during the time interval on day . Figure 1 (left) shows the projection of the ’s (vectors of length ) into a functional space spanned by a cubic B-spline using GCV criterion.
An important goal of analyzing the call center data is to investigate the existence of periodic behaviors (e.g., weekly or monthly). Figure 1 (right) visually confirms the existence of a strong weekly pattern in the dataset. Since one cannot visually confirm the presence of a monthly behavior using similar graphs, it would be interesting to show that FSSA can provide tools and machinery to extract such weaker signals.
In order to capture both monthly and weekly pattern by FSSA, first, we choose window length as multiple of and close to , i.e., . Then, we provide several plots using Rfssa package for grouping the components (Figure 5). These plots are the functional form (analogy) of the ones commonly used in the SSA literature (Golyandina et al., 2001). As it can be seen in the plot of leading singular values (scree plot), the first singular value is relatively large, and there exists three evident pairs with almost equal leading singular values correspond to the three components. The w-correlation plot suggests partitioning the eigentriples into five groups: , , , , and the remainder that does not seem to contain any strong signal. Considering the remaining plots (eigenfunctions and paired eigenfunctions), one can see the eigentriple pairs , and are related to a one-week periodicity with frequencies , and , while the last group, eigentriple pair , describes a weak monthly cycle. These groups can reproduce the reconstructed FTS. The functional components associated with the first four groups are presented in Figure 2 (bottom) from left to right. Furthermore, some creative visualization tools that are implemented in the Rfssa
package can be used to extract within/between days patterns for the call center data by employing the estimated multivariate eigenfunctions (Figure6). It is worth to mention that in Figure 6 (right), there are 28 curves associated with all nine eigenfunctions (graphs). One may note that for the first eigenfunction, all curves resemble a similar pattern respective to the main trend. Furthermore, eigenfunctions contain seven distinguish patterns that each consists of four curves whereas distinct curves construct eigenfunctions .
For further clarification, we provide the multivariate trace periodicity test of Hörmann et al. (2018) on six sets of FTS (original signal , , , , , ), where represents residual curves obtained via removing the reconstructed FTS by the first eigentriples, from the original signal . Table 2 provides the p-values of the test for the periods of length and days (p-values for testing the weekly and monthly patterns). It is clear that periodicity test captures the weekly pattern for , , and that contain either all or part of the weekly components (p-value=0). After subtracting the functional mean of all curves (first eigentriple) and the weekly components (eigentriple 2-7), the monthly pattern in is not weak anymore, and the periodicity test can capture the monthly cycle in (p-value=0). Finally, is the remainder of the signal after removing all of the weekly and monthly components, and that’s why the associated p-values in the last row are not significant anymore.