Many dynamic processes involve abrupt changes and change point analysis is to identify the locations of change points in time series data. There exists abundant research on change point analysis for univariate time series data. Examples include Sen and Srivastava (1975), Inclán and Tiao(1994), Chen and Gupta (1997), Kokoszka and Leipus (2000), Lavielle and Moulines (2000), Ombao et al. (2001), Davis et al. (2006), Davis et al. (2008), and Shao and Zhang (2010). Change point analysis for classical multivariate time series data has also been extensively studied. Examples include Srivastava and Worsley (1986), James et al. (1992), Desobry et al. (2005), Harchaoui et al. (2009), Zhang et al. (2010), Siegmund et al. (2011) and Matteson and James (2014).
With explosive development of high-throughput technologies, high-dimensional time series data are commonly observed in many fields including medical, environmental, financial, engineering and geographical studies. Change point analysis for high-dimensional data has received a lot of attention in recent years. For instance, Bai (2010) considered estimating the location of a change point in high-dimensional panel data under the assumption that the change has occurreda priori. Chen and Zhang (2015) proposed a graph-based approach to test and estimate change points under the assumption that a sequence of observations are independent.
In this paper, we propose a new nonparametric procedure to detect change points in the mean of high-dimensional time series data. Let be a sequence of -dimensional observations and be the mean of for , where the dimension can be much larger than the sample size . We first test
where are some unknown change points. When is rejected, we further estimate the locations of change points. Different from Chen and Zhang (2015) which assumed a sequence of observations to be independent, our procedure incorporates both spatial and temporal dependence, namely spatial dependence among the -components of at each and temporal dependence between any and for . Different from Bai (2010) which imposed growth rate of the dimension with respect to the sample size , our procedure allows the dimension to be much larger than the number of observations . Most importantly, our procedure is able to detect a change point on the boundary when data dependence is present. This feature distinguishes our procedure from other existing methods. The implementation of the proposed procedure is provided in the R package HDcpDetect (Okamoto et al., 2018).
2. Main Results
2.1 Test statistic
For any , we consider a bias-corrected statistic
Here, and . With defined in Condition 1 of Section 2.2, is an
-dimensional vector withand for ,
The element at th row and th column of the matrix is
The th component of the -dimensional random vector is
Imposing in (2.1) leads to (2.6) in Proposition 1 that excludes the interference of data dependence in testing the hypothesis and estimating locations of change points in (1.1). The proposed depends on which separates dominant dependence from the remainder. How to choose a proper in practice will be addressed in Section 3. From here to the end of Section 2, we simply assume to be known in order to present theoretical results of our methods.
For the two-sample testing problem of means,
can be reduced to the test statistic in Bai and Saranadasa (1996) with temporally independent sequence, and the test statistic in Ayyala et al. (2017) with-dependent Gaussian process. Their asymptotic testing procedures require and thus cannot test the hypothesis in (1.1) if a change point occurs near the boundary, specially at or . Unlike Bai and Saranadasa (1996) and Ayyala et al. (2017), we establish the asymptotic normality of at any and the testing procedure can be applied regardless of locations of change points. Moreover, there is no need to estimate a change point in the two-sample testing problem as two samples have been pre-specified before testing. In addition to hypothesis testing, we establish an estimating procedure based on for the locations of change points regardless of locations of change points.
2.2 Hypothesis testing
To study asymptotic properties of , we model the sequence of -dimensional random vectors by
where is the -dimensional population mean, is a matrix with , and so that are mutually independent and satisfy , and for some finite constant .
By allowing to depend on , each has its own covariance described by , and each pair of and has its own temporal dependence described by for . Model (2.5) is thus flexible for many applications. We require to guarantee the positive definite of . It also ensures the existence of ’s under special structural assumptions. For example, if all ’s are temporally independent, the condition guarantees the existence of ’s so that if . Another advantage of (2.5
) is that it does not assume Gaussian distribution of
beyond the existence of fourth moment.
Let , and define a weight function . Moreover, for any matrix , we let .
Condition1 (Spatial and temporal dependence assumption). We assume that for . Moreover, as , there exists such that
Condition2 (Covariance assumption). For , , , with ,
Condition 1 assumes the stationary on which can be relaxed to the locally stationary. Condition 1 is trivially true for temporally independent or -dependent sequence, but general as the sequence needs not be -dependent. Moreover, Condition 1 does not impose any structural assumption on dependence within a critical value , but only requires that the spatial dependence beyond the critical value is not too strong, so that the two equations are satisfied. At last, comparing to the usually assumed mixing condition, it is advantageous as mixing condition is hard to verify for the real data and usually requires additional smoothness or restrictive moment assumptions (Carrasco and Chen, 2002).
Condition 2 is imposed on the covariance matrix of the entire sequence of . To see this, let and from (2.5). The covariance matrix of is , where each block diagonal matrix of describes the spatial dependence among components of each , and each block off-diagonal matrix measures the spatio-temporal dependence of and for . To impose a condition on , we may consider
, which is satisfied if all the eigenvalues ofare bounded or the dependence of is not too strong. However, it is more desirable to impose the condition on the spatial and temporal dependence through . By the relationship that , it can be shown that Condition 2 is a sufficient condition for . Another advantage of Condition 2 is that we do not require any explicit relationship between dimension and the number of observations .
The mean and variance ofare given by the following proposition.
Under (2.5) and Condition 1, and for ,
where , and with .
where the set and the matrix satisfies
Now we are ready to present the asymptotic normality of .
To implement a testing procedure based on Theorem 1, we need to estimate
under the null hypothesis. The only unknown terms arefor and from . Similar to Li and Chen (2012), we estimate them by
where represents the sum of indices that are at least apart, and with are the corresponding number of indices. As a result, the estimator of is
Assume the same conditions in Theorem 1 and of (1.1). As and for any , converges in distribution to the standard normal .
One of the contributions in this paper is establishing the asymptotic normality of for any . This enables us to test the hypothesis of (1.1) even when a change point is on the boundary of a sequence. We conduct some simulations for a visual inspection. Figure 1 illustrates histograms of based on iterations for and , respectively. The data were generated based on the setups in Section 4.1. Clearly, as and increase, the histograms converge to the standard normal curve even when equals 1.
From Theorem 2, our testing procedure rejects of (1.1) if with a nominal significance level , where is the upper-quantile of . The testing procedure relies on and may lose power if the chosen is very different from the location of a change point. For example, there exists only one change point located near the boundary. If we choose a near the middle to break the entire sequence into two subsequences, the small piece with mean change falls into a long subsequence and its contribution to the change point detection is diluted by averaging all observations in the subsequence. In order to circumvent the difficulty of choosing and most importantly retain the power of the test, we accumulate all the marginal and consider
Assume the same conditions in Theorem 1. As , converges to the standard normal in distribution. Especially under of (1.1), converges to the standard normal in distribution.
Based on Theorem 3, we reject of (1.1) with a nominal significance level if . Free of the tuning parameter and retaining the power, the testing procedure based on is thus chosen for the existence of any change point.
Remark 1. Alternatively, one may consider the max-norm statistic . If there are only few time points where the difference of and is large, the max-norm based test is expected to be more powerful than our proposed test. If the small differences occur in many time points, our test can dominate the max-norm based test by aggregating all small differences. Furthermore, it requires stringent conditions to establish the extreme value distribution of and its rate convergence is known to be slow (Liu and Shao, 2013).
2.2 Estimating one and multiple change points
If the null hypothesis is rejected, we further estimate the change points. We first consider the case of one change point. The location of a change point is estimated by
where is given by (2.1). The rationale of proposing is demonstrated by Lemma 1.
Let and where is given in Proposition 1. Here and measure signal strength and maximal noise, respectively. The following theorem establishes the convergence rate of .
Assume that the change-point satisfies with . Under the same conditions in Theorem 1, as ,
Remark 2. Under cross-sectional dependence but temporal independence, we can derive
Moreover, under the local alternative that the change in tends to zero, the leading order if all the eigenvalues of are bounded, and thus
Remark 3. In the change point literature, it commonly assumes that the change point is of the form with , that is with in terms of our notation. The corresponding convergence rate is . This excludes the case that the change point is near or on the boundary. Theorem 4 is general as can vary within [0, 1]. Especially, when the change point (near or on the boundary), the convergence rate is which is times slower than the convergence rate when .
To estimate the locations of multiple change points , we can iteratively apply a binary segmentation method similar to that in Vostrikova (1981). Suppose that we have already estimated change points as , which partition the original data into segments. Define and . Let , and be the statistics calculated based on data from the th segment . For each of segments, we first conduct hypothesis testing by checking if where is a chosen nominal significance level. If yes, no change point is estimated from . Otherwise, one change point is estimated as , which further partitions into and . Repeat the above procedure iteratively until no more change point can be estimated from any segment.
Let be the set of all change points and be the set of estimated change points, respectively. Letting and , we define
to be the minimal signal-to-noise ratio from all segments, each of which has starting point for and ending point for . We establish the consistency of under the following Condition 3 plus Conditions 1–2.
Condition 3 (Minimal signal-to-noise ratio assumption). As , and diverges such that . Furthermore, in Theorem 4, for all that contains at least one change point.
Assume (2.5) and Conditions 1–3. As , converges to in probability.
Remark 4. The binary segmentation can control the family-wise error rate (FWER) as we set in Condition 3. Especially, one can choose so that the FWER is controlled even if other conditions in Theorem 5 are not satisfied.
Remark 5. The defined provides a quantitative measure for efficiency of the binary segmentation. To appreciate this, we consider a configuration of two change points and . Let , . The piecewise constant signals are zero in and and positive in . Then is the smallest signal-to-noise ratio from , , . Especially, if is short and buried in the middle of the large segment (Olshen and Venkatraman, 2004), is close to zero. The binary segmentation is well known to be inefficient under this configuration. To improve its performance, we may consider , where is the test statistic (2.1) defined in a randomly generated interval with . The rationale is that when happens to be or , the change-point detection will be more powerful than that based on the entire sequence. Based on , the circular binary segmentation or wild binary segmentation can be implemented accordingly.
3. Elbow Method for Dependence
The proposed procedure relies on the choice of , which is unknown in practice. From Condition 1, separates dominant temporal dependence from the remainder. As demonstrated in simulation studies of Section 4, if data are dependent (), wrongly applying the procedure based on the assumption that
can cause severe type I error and thus produce a lot of false positives when estimating locations of change points. On the other hand, choosing a value that is larger than the actualwill reduce the power of the test and thus generate more false negatives. Here we propose a quite simple way to determine .
Condition 1 states that is relatively small if , or equivalently, is small if . The unknown can be consistently estimated by (2.8) with under the null hypothesis according to the proof of Theorem 2. Even under the alternative hypothesis, the effect of heterogeneity of means on the estimation is of small order as long as the heterogeneity is not too strong. We thus determine by calculating (2.8) for each integer starting from , and terminate the process once a small value appears. Visually, we can plot (2.8) versus , and use the elbow in the plot to determine .
To demonstrate the idea above, we generated the random sample for using (4.1) in Section 4 with and . We considered and , respectively. Figure 2 illustrates (2.8) versus based on iterations. When the actual , the elbow happened at under both null and alternative hypotheses. We thus estimated by . Similarly, when , the elbow happened at which suggested us to estimate by .
4. Simulation Studies
4.1 Empirical performance of the testing procedure
The first part of simulation studies is to investigate the empirical performance of the test statistic with asymptotic normality established in Theorem 3. The random sample for , were generated from the following multivariate linear process
where is the -dimensional population mean vector at point , is a matrix for , and is a -variate random vector with mean and identity covariance . In the simulation, we set for , and . For and , we considered two different scenarios. If , we simply chose so that became an independent sequence. If , we chose and each row of them had only non-zero elements that were randomly chosen from with magnitude generated by Unif . By doing so, the dependence was dominated by for plus perturbations contributed by and .
Without loss of generality, we chose for under of (1.1). Under the alternative hypothesis, we considered one change-point such as for and for . The non-zero mean vector had non-zero components which were uniformly and randomly drawn from coordinates . Here, denotes the integer part of . The magnitude of non-zero entry of was controlled by a constant multiplied by a random sign. The nominal significance level was chosen to be . All the simulation results were obtained based on 1000 replications.
We also considered two competitors. One is the E-div test proposed by Matteson and James (2014) and the other is the CQ test proposed by Chen and Qin (2010). Both testing procedures assume independence of . The CQ test was originally designed for the two-sample problem, requiring the change point to be known. To implement the CQ test, we used the true change point to divide the sequence into two samples under the alternative hypothesis. Under the null hypothesis, we obtained the two samples by adopting the same time point used for the alternative.
Table 1 demonstrates empirical sizes and powers of three tests with Gaussian in (4.1). To obtain power, we chose the location of the change point and magnitude . Under temporal independence (), sizes of all three tests were well controlled around the nominal significance level . Under dependenc (), the CQ and E-div tests suffered severe size distortion. Unlike those two tests, the proposed test still had sizes well controlled around the nominal significance level . Due to severe size distortion of the CQ and E-div tests, it is not relevant to compare the power of three tests under dependence. We thus only conducted power comparison when . Empirical powers of three tests increased as and increased. The reason the CQ test had the best power among three is that it utilized the information of location of the change point. In real application, such information is unavailable. The proposed test always enjoyed greater powers than the E-div test with respect to different and .
Under spatial and temporal dependence, we also studied the power of the proposed test subject to different combinations of sample size , dimension and location of the change point. The results are included in the supplementary material. The supplementary material also includes the results of the proposed test when follows -distribution. The patterns of sizes and powers were quite similar to those when follows Gaussian distribution, showing the nonparametric property of the proposed test.
4.2 Empirical performance of the estimating procedure
The second part of the simulation studies aims to investigate the empirical performance of the change point estimator in (2.11). We first considered the situation with one change-point such as for and for . The non-zero mean vector had non-zero components, which were uniformly and randomly drawn from coordinates . The magnitude of non-zero entry of was controlled by a constant multiplied by a random sign. Figure 3 demonstrates the proportion of the iterations detecting the change point that was located at the time point and , respectively. First, the probability of detecting the change point increased as dimension increased. Second, comparing the right panel with the left panel, the probability of detecting the change point became lower as dependence increased from to . Finally, comparing the lower panel with the upper panel, the stronger signal strength was needed when the change point was at , in order to retain the similar detection probability when the change point was located at . Our empirical results are consistent with the theoretical results in Theorem 4.
We also compared the proposed change-point estimator with the one proposed in Bai (2010). Since the estimating procedure in Bai (2010) assumes that the change point exists a priori, we implemented both methods without conducting hypothesis testing. Figure 4 illustrates that the change-point estimator in Bai (2010) failed to identify the change point at under temporal dependence (). Unlike the change-point estimator in Bai (2010), the proposed change-point estimator performed well under both temporal independence and dependence.