1 Introduction
Change point detection and localization is a classical problem in time series analysis, in which we record a series of measurements and wish to determine whether and at what time(s) the underlying generative model has changed. Due to its flexibility, the model of a time series with multiple structural changes has a wide range of applications including econometrics [Bai and Perron (1998)], stock price analysis [Chen and Gupta (1997)], Internet security monitoring [Peng et al. (2004)], and genetics [Castro et al. (2018)].
In some settings, change point detection is quite well understood. For instance, in the mean change point model, we usually assume we observe a collection of highdimensional time series vectors
such thatHere are independent measurement noise and are the population quantities that change over time in a piecewise constant manner. In this setting the most important task is to determine the time points at which the structural changes of take place. There is a vast literature dealing with the problem of finding the change points in the piecewise constant mean models in low and high dimensions. Frick et al. (2014)
proposed a computationallyefficient method called SMUCE that can simultaneously estimate the location and the confident intervals of the change points in the mean change point detection problems.
Cho et al. (2016), Cho and Fryzlewicz (2015) and Wang and Samworth (2018) studied highdimensional mean change point detection problems. More recently, Pein et al. (2017)introduced a method that can handle mean and variance changes simultaneously.
Cribben and Yu (2017) and Wang et al. (2018a), among others, inspected the mean change point problems for the dynamic network Bernoulli network models.However, in other applications, we can only obtain indirect measurements of the highdimensional vectors . Specifically, consider the scenarios in which we observe the time series where are dimensional covariates, are the univariate responses satisfying and that are the unobserved regressors that change over time. We formally summarize this model as
Model 1 (Change Point Model in the Regression Setting).
For any , suppose and satisfy
(1) 
where ^{1}^{1}1We assume for convenience that the variance of to be . However our results remain valid as long as are i.i.d subGaussian random variables with bounded variance. . In addition, there exists a collection of change points with and such that
Our main task is to develop computationally efficient algorithms that can consistently estimate the unknown time points at which the unobserved parameters change.
Any change point estimators
are deemed to be consistent if, with probability approaching 1,
and the supnorm error satisfies(2) 
for all sufficient large .
The lowdimensional
change point linear regression model has been extensively considered by many authors including
Bai and Perron (1998), Qu and Perron (2007), and most recently Zhang et al. (2015b). Most of the existing works in this setting focus on the cases where the number of change points, , is a fixed constant.In the highdimensional regime where , the change point regression model has also received recent attention. In particular, Lee et al. (2016) extended Lasso to the highdimensional singlechangepoint setting and showed that both the change point and the regression parameters can be consistently estimated. Later, Lee et al. (2018)
extended their results to the highdimensional quantile regression model.
Kaul et al. (2018) proposed a highly efficient algorithm for the setting of exactly one change point. Both Lee et al. (2016) and Kaul et al. (2018) showed that in the one change point setting, the change point can be estimated with supnorm error satisfying . Zhang et al. (2015a) studied the Sparse Group Lasso (SGL) algorithm for the multiple change points setting. The authors showed that SGL returns consistent change point estimators with when the number of change point is bounded. Leonardi and Bühlmann (2016) showed that, by using a binary search algorithm, consistent estimation can be achieved with even when the number of change points diverges as .Contributions
In Section 2, we introduce a new twostage procedure called Binary Segmentation through Estimated CUSUM statistics (BSE) to estimate the change points in the highdimensional regression model. Given the time series data , in Stage 1 of BSE, we estimate the coefficients using a novel algorithm developed for the highdimensional change point settings. Then in Stage 2, we perform a novel multidimensional binary segmentation search method to localize the significant change points using the estimated sequence . Our careful analysis shows that the BSE algorithm can consistently estimate the change points in the regression time series even when the number of change points diverges as . Furthermore, the supnorm error (defined in (2)) of the BSE change point estimators is, up to log factors, of order . To the best of our knowledge, this is a significant improvement in the highdimensional multiple change points regression setting, as the previous known method in this setting can only achieve at best. See Table 1 for detailed comparisons. A key step of BSE in our methodology is estimating the timeseries of ’s. We establish mean squared error bounds (defined in (6)) for our regression parameter estimators that may be of independent interest.
While the BSE algorithm is proven to obtain sharp statistical error bound, a key drawback of the procedure is that the computational cost of BSE in high dimensions grows exponentially in the number of change points . To overcome this limitation, in Section 3 we introduce another new algorithm call Binary Segmentation through Lasso Estimators (BSLE). We show that BSLE can also consistently localize the change points and the supnorm error bound is worse by a factor of at most compared with BSE, where
(3) 
While BSLE achieves a slightly worse statistical bounds, we note that its computational cost is only of order , where Lasso(n) denotes the computational cost of Lasso with samples. For comparing and contrasting computational complexities, we assume throughout that Lasso, GroupLasso and SparseGroupLasso have the same computational complexity.
Finally, we consider the scenario in which we are given a collection of consistency change point estimators from an existing algorithm whose localization error bound may not be optimal. By leveraging the insights gained from BSE, we propose a new method call “Local Screening Algorithm” (LSA) in Section 4. We show that together with the observed data, LSA can efficiently refine the given estimate and allow us to improve the practical performance of the given estimators.
We summarize the localization error bound and the computational cost for our methods and the other existing competitors in Table 1. See Remark 9 for a more detailed discussion. We also provide extensive numerical evidence to complement our theoretical findings in Section 5.
Localization error bound  Computational Complexity  

BSE (Algorithm 1)  
BSLE (Algorithm 2)  
EBSA^{2}^{2}2The EBSA algorithm is by Leonardi and Bühlmann (2016). The corresponding localization error bound depends on . This is due to our communication with the authors but is not explicit in the most recent version of their paper.  
LSA (Algorithm 3) + EBSA  
SGL^{3}^{3}3The SGL algorithm was first introduced in Simon et al. (2013) and Zhang et al. (2015b) later showed that it can be used to detect change points in the highdimensional regression setting. 
Notation
For any matrix , we can write , where is the tth column of . We therefore can consider as a function that changes over time. Let be such that the tth column of equals . With a slight abuse of notation, let . This means that is a piecewise constant vector function over time if . If is any interval within and is constant over within , then we write for any . We use to denote the ith entry of . For any two real sequences and we write if there exists such that and write if and . Let be a sequence of random variables. We write if there exists a constant such that the probability of vanishes as .
Throughout each individual section, we use subscripted and superscripted to indicate absolute constants independent of and . However, these constants can have different meanings in different proofs and theorems.
2 Binary Segmentation through Estimated CUSUM statistics
In this section, we introduce a new algorithm for the regression change point models. Algorithm 1 details our approach. Overall, there are two main stages: (a) compute the approximated CUSUM statistics using the regression coefficients estimators in (8) , and (b) detecting change points using a multidimensional binary segmentation algorithm on the approximated CUSUM statistics. We start by defining the multidimensional CUSUM statistic of on any interval and any time point as
(4) 
(5) 
The multidimensional CUSUM statistics is a direct generalization of the univariate CUSUM statistics first introduced in Scott and Knott (1974). It has also been shown in Venkatraman (1993) that for the univariate mean change point model, the CUSUM statistics is an effective tool to detect the change points. Having this mind, we propose a new strategy that we first estimate the multidimensional CUSUM statistics of and then apply our newly developed multidimensional Binary Segmentation algorithm to the estimations. We summarize this strategy in Algorithm 1.
The main intuition of BSE is that the nonstationary regression coefficients are the source of the structural changes–therefore if we can estimate consistently, in the sense that if we can control
(6) 
then we can also quantify how well we estimate population multidimensional CUSUM statistics and this, as we will show, validates the usage of BSE.
As described in Algorithm 1, BSE makes use of a collection of pregenerated intervals as inputs. Such a feature was first used in the Wild Binary Segmentation algorithm (WBS) by Fryzlewicz (2014) and later in INSPECT by Wang and Samworth (2018), where the end points of the intervals are selected independently from uniformly at random. The use of random intervals is one of the key ideas of WBS, INSPECT and BSE because if is large enough, then with high probability, there exists at least one random interval containing one and only one change point. In other words, the good event
(7) 
holds with high probability, where and , . Indeed in Lemma 4, we will show that holds with probability at least if the number of random intervals . Note that if is fixed and , then we have that . By focusing on the interval with maximum CUSUM statistic among all the random intervals, one can develop sharp statistical guarantees of the change point estimators.
In order to establish the consistency of BSE, we need to first show that our proposed estimators in (8) have low MSE bounds and then show that BSE can consistently locate the change points using the estimated time series. So for the rest of the section, we will first establish the efficiency of returned by (8) in high dimensions. Then we will show the BSE algorithm is consistent with the supnorm error satisfying .
2.1 Coefficient estimation in change point linear regression
We begin by studying the coefficient estimation in the change point linear regression model. In high dimensions, the estimation of has been previously studied by a few authors such as Lee et al. (2016), Kaul et al. (2018) and Leonardi and Bühlmann (2016). However, for all these existing results, the statistical guarantees of of their estimators are based on the existence of accurate estimate of the change points . In contrast, we present the following novel algorithm and as we will show, the coefficients can be consistently estimated without using any preestimated change points and that the MSE bounds for the coefficient estimation are valid even when , the number of change points, grows as increases.
Our analysis will cover the standard highdimensional regime in which but . In order to establish consistency in this setting, we need to adopt the following assumptions.
Assumption 1.
Consider the model defined in creftypecap 1.

Suppose there exists a subset such that
and that for some constant .

are dimensional covariates such that for some absolute constant .

Suppose for all , and all vectors ,
for some positive constant and .

.
In creftypecap 1, a and b are both standard assumptions for the high dimension time series linear regression model. In particular, a is used in Lounici et al. (2011) to analyze the multiple regression model. Both a and b are used in Kaul et al. (2018) and Leonardi and Bühlmann (2016). c
is a standard restricted eigenvalue condition. In fact it has been shown in
Raskutti et al. (2010) and in Zhou (2009) that if are i.i.d (sub)Gaussian random covariates with positive definite covariance matrix, then c holds with probability at least . Finally, we note that when we estimate in the time segment , we would need to assume that effective sample size is large, usually of order (see for instance Negahban et al. (2012)). Therefore creftypecap 1 d can be thought of as a standard sample size condition commonly found in the highdimensional regression literature.Denote and . Consider the following algorithm.
(8)  
The following proposition provides the MSE bound for the coefficient estimators in (8).
Proposition 1.
Suppose and in addition there exists a sufficiently large constant such that and . Then under creftypecap 1, with probability at least ,
for some absolute constant .
Remark 1.
As suggested by Proposition 1, we note that in order to ensure that is admissible to defined in (8), we need to choose and . However, as long as we can further ensure that , and , then
We remark that, up to log factors, this is a tight upper bound. See Section 6.2 for a discussion of lower bounds in this highdimensional setting.
Remark 2.
We also note that we require the tuning parameter in Algorithm 1 only for mathematical convenience. In fact, consider the following additional assumption: suppose that
for some that is a constant independent of and . Then it can be shown that the solution
(9)  
satisfies
for some absolute constant . The proof of this result follows immediately from that of Proposition 1 and Lemma 1 and will be omitted for brevity.
As we will discuss in Section 6.1, the computational cost of solving (8) is , where denote the computational cost of Group Lasso with samples. However,
our main result in the this section has at least two useful implications.
1) When is small, it is computationally feasible to solve (8) and it does provide a sharp MSE bound for coefficient estimation in the highdimensional change point problem. Previous papers such as Lee et al. (2016) and Kaul et al. (2018) also study the estimation of in the setting of , but we provide a slightly more computationally costly approach for small with sharper statistical rates in a slightly more general setting.^{4}^{4}4Consider the case of . Compared with Lee et al. (2016), our MSE bounds are obtained in a more general setting: we have not made any assumptions on the size of the structural change and our analysis covers the setting when . Our MSE bounds are also shaper than that obtained in Kaul et al. (2018), as the latter are off by polynomial factors compared with the lower bounds discussed in Section 6.2.
2) When is large, our approach in (8) is no longer computationally feasible. However
the bound obtained in Proposition 1 can be viewed as a baseline approach and shows that there exists a polynomial time algorithm that can achieve sharp MSE bounds as long as .
2.2 The consistency of BSE
In the previous section, we have shown that the coefficients of the change point regression model can be estimated in high dimensions using the approach described in (8). In this section, we show that the BSE algorithm can consistently estimate the change points with sharp error bounds.
Recall that we use to denote the number of change points in creftypecap 1. In oder to state the main result of this section, we also need to introduce following additional parameters. Let and be two positive parameters such that
(10) 
and that
(11) 
In the change point literature, the minimal spacing and the minimal change size are commonly used to characterize
the signal size of the change point model. See e.g, Wang et al. (2018b) and Wang and Samworth (2018). It intuitively makes sense that
if the time between changes or the size of the changes are too small, then it is not possible to find the change points in the time series.
We show that the change point estimates returned by the BSE algorithm are consistent by demonstrating a more general result in Proposition 2.
Proposition 2.
Let and satisfy (10) and (11). Suppose used in (5) are any estimators of such that
for some parameter and that there exists a sufficiently large constant such that
Suppose in addition that event holds and that . Let the input parameters of BSE satisfy
(12) 
for some sufficiently large constant and some sufficiently small constant . Then the estimators returned by BSE with data , input parameters and , , and are such that with probability at least ,
(13) 
where is some absolute constants independent of and .
In the BSE algorithm, we uses the estimators (8) to estimate the multidimensional CUSUM statistics . We note that as suggested by Proposition 2, in general any consistent estimator can potentially work, as long as the corresponding MSE bound is sufficiently small. Indeed in the Section 5, our empirical results suggest that the change points can still be consistently localized even if the estimators in Algorithm 1 are computed using the SGL algorithm instead of (8).
For completeness, we formally summarize our main result about BSE in the following theorem, being a direct corollary of Proposition 2.
Theorem 1.
Suppose creftypecap 1 holds. Let , and be chosen as in Proposition 1 and in addition let be chosen so that
(14) 
for some sufficiently small constant and some sufficiently large constant . In addition suppose the event holds and that . Then the estimates returned by BSE with inputs , and and is such that with probability at least ,
(15) 
where is some absolute constants independent of and .
Remark 3.
Remark 4.
In Theorem 1, we also require that , which means that the all the random intervals can not contain too many change points. We note that if , then we can simply let . In addition, if we simply let in Theorem 1, then error bound is inflated to
(16) 
The localization error bound is inflated by a factor of compared to (15). We remark that in the highdimensional change point literature, the supnorm error bound are commonly found to be dependent on . For example the order of error bounds in (16) agrees with that derived in Kaul et al. (2018) in the setting of . In addition, while Wang and Samworth (2018) focused on the highdimensional mean change point problem (i.e. they assume to observe the time series noise), the supnorm error derived in their main result (Theorem 3 of their paper) is, up to log factors, the same as (16) as long as is a fixed constant.
Remark 5.
The second stage of BSE in Algorithm 1 can be considered as the highdimensional version of the univariate Wild Binary Segmentation (WBS) first introduced in Fryzlewicz (2014). We note that there are two other important contributions that also successfully extend WBS to high dimensional settings: the Sparsified Binary Segmentation algorithm (SBS) introduced by Cho and Fryzlewicz (2015) and the INSPECT algorithm by Wang and Samworth (2018). However, neither INSPECT nor SBS are designed for the change point regression setting and therefore may not be the solutions to our problem.
To be more precise, we first remark that while INSPECT is proven to be effective and efficient in the highdimensional mean change point models, it is a nontrivial open problem to establish the consistency of INSPECT under our setting. The theoretical properties of INSPECT are heavily based on the usage of the sample splitting and the semidefinite programming. It is therefore unclear whether one can easily establish the same statistical guarantees using the estimated samples as the input of INSPECT.
Also, the SBS algorithm can be viewed as a hardthresholding of the CUSUM vectors followed by an aggregation. Therefore it characterizes the change size using norm instead of norm we use in (10), which means that SBS necessarily assumes that
must be sufficiently large. We remark that if we use of to characterize the change size, then we would need to quantify the accuracy of the estimators using the meansup error:
In the sparse regression problems, such bounds typically require stronger assumptions than we need in Theorem 1.
3 Binary Segmentation through Lasso Estimators
In the previous section, we have shown that the BSE algorithm can accurately locate the change points with localization errors . However, as discussed in Section 6.1, the computational cost of the coefficient estimation in algorithm (8) is for some , which makes BSE infeasible when is large. To overcome this limitation, in this section we consider an alternative approach which is much more computationally efficient. To motivate such an approach, observe that the multidimensional CUSUM statistics in (4) can be rewritten as
(17) 
Therefore, instead of estimating , we can alternatively consider directly estimating
for any interval that is used in the binary search procedure.
We remark that given any generic interval , which possibly contains multiple change points, the population quantity can also be thought of as the best linear predictor given the data on the time interval :
The best linear predictor in the highdimensional misspecified models has been intensively studied by many authors before, including Bunea et al. (2007), Van De Geer et al. (2009) and references therein. As a result, given any generic interval , we will consider the estimator
(18) 
where is a userspecified tuning parameter. In the following lemma, we show that wellapproximates .
Lemma 1.
Suppose creftypecap 1 holds. Denote
(19) 
Let be defined as in (18). Suppose in addition and . Then with probability at least ,
for some constant only depending on .
Remark 6.
We note that under creftypecap 1, it is easy to see that the parameter defined in (19) of Lemma 1 satisfies . However, if for example we assume that for some absolute constant , then
which implies that can be bounded by an absolute constant independent of and under this mild condition.
We are ready to describe the main algorithm in this section, which we call Binary Segmentation through Lasso Estimators (BSLE).
With the additional parameter , the statistical consistency of Algorithm 2 can be established as follows.
Theorem 2.
Suppose creftypecap 1 holds. Let the input parameters , and in Algorithm 2 satisfy
(20) 
for some sufficiently small constant and sufficient large constants . In addition suppose that event holds and that for an absolute constant . Then the estimators returned by BSLE with input parameters , and satisfies
(21) 
where is some constant independent of and .
Remark 7.
In Algorithm 2, the tuning parameter is the threshold for the Binary Search algorithm, which is commonly found in much of the change point literature. is the standard tuning parameter to ensure the estimators are sparse when . The tuning parameter is needed to ensure that the restricted eigenvalue condition is valid for any interval used in binary search procedure. Similar requirements that are used to ensure the validity of restricted eigenvalue condition can also be found in Leonardi and Bühlmann (2016) and Kaul et al. (2018).
Remark 8.
As discuss in Remark 6, without any additional assumptions, . Therefore if in Theorem 2 we can further guarantee that , then the localization error bound for BSLE satisfies
where is the supnorm error defined in (2). Therefore in the setting where we only have finitely many change points, the statistical error bound for BSLE is worse by a factor of compared to BSE.
Remark 9.
In Section 2.2 of their paper, Leonardi and Bühlmann (2016) proposed an algorithm, which is called Efficient Binary Segmentation Algorithm (EBSA) to detect the change points in regression models. We remark that our BSLE is inspired by and closely related to EBSA and it is clear that the computational cost of BSLE and EBSA are of the same order when is a bounded constant. While EBSA is proven to be efficient and effective in the high dimensional change point regression model, we remark that BSLE also has a few advantages. 1) The way we characterize the structural changes in (10) and (11) is more interpretable than that in Assumption 4^{5}^{5}5In addition to interpretability, the change size defined in Assumption 4 of Leonardi and Bühlmann (2016) will depend on , and , especially when we allow to diverge or . Therefore we choose make comparisons when is a fixed constant. of Leonardi and Bühlmann (2016). 2) As summarized in Table 1, the localization error bound of BSLE we can prove is shaper than the EBSA in Leonardi and Bühlmann (2016) by a factor of . We note that from our simulation study, we do observe that BSLE and EBSA have very similar empirical performance. Due these observations, we conjecture that the EBSA algorithm by Leonardi and Bühlmann (2016) can also achieve localization error bound of order , but we do not have a proof.
4 Local Screening
In this section, we consider a local screening algorithm that can potentially sharpen the localization error rate given a collection of change point estimators that are consistent but not necessarily optimal. Consider the following algorithm.
(22)  
(23) 
Comments
There are no comments yet.