Statistically and Computationally Efficient Change Point Localization in Regression Settings

06/26/2019
by   Daren Wang, et al.
0

Detecting when the underlying distribution changes from the observed time series is a fundamental problem arising in a broad spectrum of applications. Change point localization is particularly challenging when we only observe low-dimensional projections of high-dimensional random variables. Specifically, we assume we observe $\{ x_t, y_t\}_{t=1}^n$ where $ \{ x_t\}_{t=1}^n$ are $p$-dimensional covariates, $\{y_t\}_{t=1}^n $ are the univariate responses satisfying $E(y_t)=x_t^\top \beta_t^*\text{ for all }1\let\le n$ and that $\{\beta_t^*\}_{t=1}^n $ are the unobserved regression parameters that change over time in a piecewise constant manner. We first propose a novel algorithm called Binary Segmentation through Estimated CUSUM statistics (BSE), which computes the change points through direct estimates of the CUSUM statistics of $\{\beta_t^*\}_{t=1}^n $. We show that BSE can consistently estimate the unknown location of the change points, achieving error bounds of order $O (\log(p)/n) $. To the best of our knowledge, this is a significant improvement, as the state-of-the-art methods are only shown to achieve error bounds of order $O(\log(p)/\sqrt n)$ in the multiple change point setting. However, BSE can be computationally costly. To overcome this limitation, we introduce another algorithm called Binary Segmentation through Lasso Estimators (BSLE). We show that BSLE can consistently localize change points with a slightly worse localization error rate compared to BSE, but BSLE is much more computationally efficient. Finally, we leverage the insights gained from BSE to develop a novel "local screening" algorithm that can input a coarse estimate of change point locations together with the observed data and efficiently refine that estimate, allowing us to improve the practical performance of past estimators. We also justify our theoretical finding in simulated experiments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/12/2019

Localizing Changes in High-Dimensional Vector Autoregressive Processes

Autoregressive models capture stochastic processes in which past realiza...
10/22/2018

Univariate Mean Change Point Detection: Penalization, CUSUM and Optimality

The problem of univariate mean change point detection and localization b...
09/25/2018

Optimal Change Point Detection and Localization in Sparse Dynamic Networks

We study the problem of change point detection and localization in dynam...
12/28/2017

Optimal Covariance Change Point Detection in High Dimension

We study the problem of change point detection for covariance matrices i...
11/18/2019

Change point localization in dependent dynamic nonparametric random dot product graphs

In this paper, we study the change point localization problem in a seque...
04/29/2020

Optimal Study Design for Reducing Variances of Coefficient Estimators in Change-Point Models

In longitudinal studies, we observe measurements of the same variables a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Change point detection and localization is a classical problem in time series analysis, in which we record a series of measurements and wish to determine whether and at what time(s) the underlying generative model has changed. Due to its flexibility, the model of a time series with multiple structural changes has a wide range of applications including econometrics [Bai and Perron (1998)], stock price analysis [Chen and Gupta (1997)], Internet security monitoring [Peng et al. (2004)], and genetics [Castro et al. (2018)].

In some settings, change point detection is quite well understood. For instance, in the mean change point model, we usually assume we observe a collection of high-dimensional time series vectors

such that

Here are independent measurement noise and are the population quantities that change over time in a piecewise constant manner. In this setting the most important task is to determine the time points at which the structural changes of take place. There is a vast literature dealing with the problem of finding the change points in the piecewise constant mean models in low and high dimensions. Frick et al. (2014)

proposed a computationally-efficient method called SMUCE that can simultaneously estimate the location and the confident intervals of the change points in the mean change point detection problems.

Cho et al. (2016), Cho and Fryzlewicz (2015) and Wang and Samworth (2018) studied high-dimensional mean change point detection problems. More recently, Pein et al. (2017)

introduced a method that can handle mean and variance changes simultaneously.

Cribben and Yu (2017) and Wang et al. (2018a), among others, inspected the mean change point problems for the dynamic network Bernoulli network models.

However, in other applications, we can only obtain indirect measurements of the high-dimensional vectors . Specifically, consider the scenarios in which we observe the time series where are -dimensional covariates, are the univariate responses satisfying and that are the unobserved regressors that change over time. We formally summarize this model as

Model 1 (Change Point Model in the Regression Setting).

For any , suppose and satisfy

(1)

where 111We assume for convenience that the variance of to be . However our results remain valid as long as are i.i.d sub-Gaussian random variables with bounded variance. . In addition, there exists a collection of change points with and such that

Our main task is to develop computationally efficient algorithms that can consistently estimate the unknown time points at which the unobserved parameters change.

Any change point estimators

are deemed to be consistent if, with probability approaching 1,

and the sup-norm error satisfies

(2)

for all sufficient large .

The low-dimensional

change point linear regression model has been extensively considered by many authors including

Bai and Perron (1998), Qu and Perron (2007), and most recently Zhang et al. (2015b). Most of the existing works in this setting focus on the cases where the number of change points, , is a fixed constant.

In the high-dimensional regime where , the change point regression model has also received recent attention. In particular, Lee et al. (2016) extended Lasso to the high-dimensional single-change-point setting and showed that both the change point and the regression parameters can be consistently estimated. Later, Lee et al. (2018)

extended their results to the high-dimensional quantile regression model.

Kaul et al. (2018) proposed a highly efficient algorithm for the setting of exactly one change point. Both Lee et al. (2016) and Kaul et al. (2018) showed that in the one change point setting, the change point can be estimated with sup-norm error satisfying . Zhang et al. (2015a) studied the Sparse Group Lasso (SGL) algorithm for the multiple change points setting. The authors showed that SGL returns consistent change point estimators with when the number of change point is bounded. Leonardi and Bühlmann (2016) showed that, by using a binary search algorithm, consistent estimation can be achieved with even when the number of change points diverges as .

Contributions

In Section 2, we introduce a new two-stage procedure called Binary Segmentation through Estimated CUSUM statistics (BSE) to estimate the change points in the high-dimensional regression model. Given the time series data , in Stage 1 of BSE, we estimate the coefficients using a novel algorithm developed for the high-dimensional change point settings. Then in Stage 2, we perform a novel multidimensional binary segmentation search method to localize the significant change points using the estimated sequence . Our careful analysis shows that the BSE algorithm can consistently estimate the change points in the regression time series even when the number of change points diverges as . Furthermore, the sup-norm error (defined in (2)) of the BSE change point estimators is, up to log factors, of order . To the best of our knowledge, this is a significant improvement in the high-dimensional multiple change points regression setting, as the previous known method in this setting can only achieve at best. See Table 1 for detailed comparisons. A key step of BSE in our methodology is estimating the time-series of ’s. We establish mean squared error bounds (defined in (6)) for our regression parameter estimators that may be of independent interest.

While the BSE algorithm is proven to obtain sharp statistical error bound, a key drawback of the procedure is that the computational cost of BSE in high dimensions grows exponentially in the number of change points . To overcome this limitation, in Section 3 we introduce another new algorithm call Binary Segmentation through Lasso Estimators (BSLE). We show that BSLE can also consistently localize the change points and the sup-norm error bound is worse by a factor of at most compared with BSE, where

(3)

While BSLE achieves a slightly worse statistical bounds, we note that its computational cost is only of order , where Lasso(n) denotes the computational cost of Lasso with samples. For comparing and contrasting computational complexities, we assume throughout that Lasso, GroupLasso and SparseGroupLasso have the same computational complexity.

Finally, we consider the scenario in which we are given a collection of consistency change point estimators from an existing algorithm whose localization error bound may not be optimal. By leveraging the insights gained from BSE, we propose a new method call “Local Screening Algorithm” (LSA) in Section 4. We show that together with the observed data, LSA can efficiently refine the given estimate and allow us to improve the practical performance of the given estimators.

We summarize the localization error bound and the computational cost for our methods and the other existing competitors in Table 1. See Remark 9 for a more detailed discussion. We also provide extensive numerical evidence to complement our theoretical findings in Section 5.

Localization error bound Computational Complexity
BSE (Algorithm 1)
BSLE (Algorithm 2)
EBSA222The EBSA algorithm is by Leonardi and Bühlmann (2016). The corresponding localization error bound depends on . This is due to our communication with the authors but is not explicit in the most recent version of their paper.
LSA (Algorithm 3) + EBSA
SGL333The SGL algorithm was first introduced in Simon et al. (2013) and Zhang et al. (2015b) later showed that it can be used to detect change points in the high-dimensional regression setting.
Table 1: In the above table, denotes the size of the sparsity. denotes the number of change points and it is assumed to be a fixed constant in this table. The parameter is defined in (3) and can be of order or depending on the assumptions.

Notation

For any matrix , we can write , where is the t-th column of . We therefore can consider as a function that changes over time. Let be such that the t-th column of equals . With a slight abuse of notation, let . This means that is a -piecewise constant vector function over time if . If is any interval within and is constant over within , then we write for any . We use to denote the i-th entry of . For any two real sequences and we write if there exists such that and write if and . Let be a sequence of random variables. We write if there exists a constant such that the probability of vanishes as .

Throughout each individual section, we use subscripted and superscripted to indicate absolute constants independent of and . However, these constants can have different meanings in different proofs and theorems.

2 Binary Segmentation through Estimated CUSUM statistics

In this section, we introduce a new algorithm for the regression change point models. Algorithm 1 details our approach. Overall, there are two main stages: (a) compute the approximated CUSUM statistics using the regression coefficients estimators in (8) , and (b) detecting change points using a multidimensional binary segmentation algorithm on the approximated CUSUM statistics. We start by defining the multidimensional CUSUM statistic of on any interval and any time point as

(4)
, , and .
Compute using (8) with parameters and .
for  do
     
     if  then
         , where
(5)
         
     else
         
     end if
end for
if  then
     Add to the set of estimated change points
     Repeat BSE on and on with
end if
The set of estimated change points.
Algorithm 1 Binary Segmentation through Estimated CUSUM statistics (BSE).

The multidimensional CUSUM statistics is a direct generalization of the univariate CUSUM statistics first introduced in Scott and Knott (1974). It has also been shown in Venkatraman (1993) that for the univariate mean change point model, the CUSUM statistics is an effective tool to detect the change points. Having this mind, we propose a new strategy that we first estimate the multidimensional CUSUM statistics of and then apply our newly developed multidimensional Binary Segmentation algorithm to the estimations. We summarize this strategy in Algorithm 1.

The main intuition of BSE is that the non-stationary regression coefficients are the source of the structural changes–therefore if we can estimate consistently, in the sense that if we can control

(6)

then we can also quantify how well we estimate population multidimensional CUSUM statistics and this, as we will show, validates the usage of BSE.

As described in Algorithm 1, BSE makes use of a collection of pre-generated intervals as inputs. Such a feature was first used in the Wild Binary Segmentation algorithm (WBS) by Fryzlewicz (2014) and later in INSPECT by Wang and Samworth (2018), where the end points of the intervals are selected independently from uniformly at random. The use of random intervals is one of the key ideas of WBS, INSPECT and BSE because if is large enough, then with high probability, there exists at least one random interval containing one and only one change point. In other words, the good event

(7)

holds with high probability, where and , . Indeed in Lemma 4, we will show that holds with probability at least if the number of random intervals . Note that if is fixed and , then we have that . By focusing on the interval with maximum CUSUM statistic among all the random intervals, one can develop sharp statistical guarantees of the change point estimators.

In order to establish the consistency of BSE, we need to first show that our proposed estimators in (8) have low MSE bounds and then show that BSE can consistently locate the change points using the estimated time series. So for the rest of the section, we will first establish the efficiency of returned by (8) in high dimensions. Then we will show the BSE algorithm is consistent with the sup-norm error satisfying .

2.1 Coefficient estimation in change point linear regression

We begin by studying the coefficient estimation in the change point linear regression model. In high dimensions, the estimation of has been previously studied by a few authors such as Lee et al. (2016), Kaul et al. (2018) and Leonardi and Bühlmann (2016). However, for all these existing results, the statistical guarantees of of their estimators are based on the existence of accurate estimate of the change points . In contrast, we present the following novel algorithm and as we will show, the coefficients can be consistently estimated without using any pre-estimated change points and that the MSE bounds for the coefficient estimation are valid even when , the number of change points, grows as increases.

Our analysis will cover the standard high-dimensional regime in which but . In order to establish consistency in this setting, we need to adopt the following assumptions.

Assumption 1.

Consider the model defined in creftypecap 1.

  • Suppose there exists a subset such that

    and that for some constant .

  • are -dimensional covariates such that for some absolute constant .

  • Suppose for all , and all vectors ,

    for some positive constant and .

  • .

In creftypecap 1, a and b are both standard assumptions for the high dimension time series linear regression model. In particular, a is used in Lounici et al. (2011) to analyze the multiple regression model. Both a and b are used in Kaul et al. (2018) and Leonardi and Bühlmann (2016). c

is a standard restricted eigenvalue condition. In fact it has been shown in

Raskutti et al. (2010) and in Zhou (2009) that if are i.i.d (sub)Gaussian random covariates with positive definite covariance matrix, then c holds with probability at least . Finally, we note that when we estimate in the time segment , we would need to assume that effective sample size is large, usually of order (see for instance Negahban et al. (2012)). Therefore creftypecap 1 d can be thought of as a standard sample size condition commonly found in the high-dimensional regression literature.

Denote and . Consider the following algorithm.

(8)

The following proposition provides the MSE bound for the coefficient estimators in (8).

Proposition 1.

Suppose and in addition there exists a sufficiently large constant such that and . Then under creftypecap 1, with probability at least ,

for some absolute constant .

Remark 1.

As suggested by Proposition 1, we note that in order to ensure that is admissible to defined in (8), we need to choose and . However, as long as we can further ensure that , and , then

We remark that, up to log factors, this is a tight upper bound. See Section 6.2 for a discussion of lower bounds in this high-dimensional setting.

Remark 2.

We also note that we require the tuning parameter in Algorithm 1 only for mathematical convenience. In fact, consider the following additional assumption: suppose that

for some that is a constant independent of and . Then it can be shown that the solution

(9)

satisfies

for some absolute constant . The proof of this result follows immediately from that of Proposition 1 and Lemma 1 and will be omitted for brevity.

As we will discuss in Section 6.1, the computational cost of solving (8) is , where denote the computational cost of Group Lasso with samples. However, our main result in the this section has at least two useful implications.

1) When is small, it is computationally feasible to solve (8) and it does provide a sharp MSE bound for coefficient estimation in the high-dimensional change point problem. Previous papers such as Lee et al. (2016) and Kaul et al. (2018) also study the estimation of in the setting of , but we provide a slightly more computationally costly approach for small with sharper statistical rates in a slightly more general setting.444Consider the case of . Compared with Lee et al. (2016), our MSE bounds are obtained in a more general setting: we have not made any assumptions on the size of the structural change and our analysis covers the setting when . Our MSE bounds are also shaper than that obtained in Kaul et al. (2018), as the latter are off by polynomial factors compared with the lower bounds discussed in Section 6.2.

2) When is large, our approach in (8) is no longer computationally feasible. However the bound obtained in Proposition 1 can be viewed as a baseline approach and shows that there exists a polynomial time algorithm that can achieve sharp MSE bounds as long as .

2.2 The consistency of BSE

In the previous section, we have shown that the coefficients of the change point regression model can be estimated in high dimensions using the approach described in (8). In this section, we show that the BSE algorithm can consistently estimate the change points with sharp error bounds.

Recall that we use to denote the number of change points in creftypecap 1. In oder to state the main result of this section, we also need to introduce following additional parameters. Let and be two positive parameters such that

(10)

and that

(11)

In the change point literature, the minimal spacing and the minimal change size are commonly used to characterize the signal size of the change point model. See e.g, Wang et al. (2018b) and Wang and Samworth (2018). It intuitively makes sense that if the time between changes or the size of the changes are too small, then it is not possible to find the change points in the time series.

We show that the change point estimates returned by the BSE algorithm are consistent by demonstrating a more general result in Proposition 2.

Proposition 2.

Let and satisfy (10) and (11). Suppose used in (5) are any estimators of such that

for some parameter and that there exists a sufficiently large constant such that

Suppose in addition that event holds and that . Let the input parameters of BSE satisfy

(12)

for some sufficiently large constant and some sufficiently small constant . Then the estimators returned by BSE with data , input parameters and , , and are such that with probability at least ,

(13)

where is some absolute constants independent of and .

In the BSE algorithm, we uses the estimators (8) to estimate the multidimensional CUSUM statistics . We note that as suggested by Proposition 2, in general any consistent estimator can potentially work, as long as the corresponding MSE bound is sufficiently small. Indeed in the Section 5, our empirical results suggest that the change points can still be consistently localized even if the estimators in Algorithm 1 are computed using the SGL algorithm instead of (8).

For completeness, we formally summarize our main result about BSE in the following theorem, being a direct corollary of Proposition 2.

Theorem 1.

Suppose creftypecap 1 holds. Let , and be chosen as in Proposition 1 and in addition let be chosen so that

(14)

for some sufficiently small constant and some sufficiently large constant . In addition suppose the event holds and that . Then the estimates returned by BSE with inputs , and and is such that with probability at least ,

(15)

where is some absolute constants independent of and .

Remark 3.

We note that the optimal choice for is of order and for is of order . Under these choices, the only assumption (14) made in Theorem 1 require the regression model parameters to satisfy

and the corresponding sup-norm error for the change point estimators return by BSE satisfies

Remark 4.

In Theorem 1, we also require that , which means that the all the random intervals can not contain too many change points. We note that if , then we can simply let . In addition, if we simply let in Theorem 1, then error bound is inflated to

(16)

The localization error bound is inflated by a factor of compared to (15). We remark that in the high-dimensional change point literature, the sup-norm error bound are commonly found to be dependent on . For example the order of error bounds in (16) agrees with that derived in Kaul et al. (2018) in the setting of . In addition, while Wang and Samworth (2018) focused on the high-dimensional mean change point problem (i.e. they assume to observe the time series noise), the sup-norm error derived in their main result (Theorem 3 of their paper) is, up to log factors, the same as (16) as long as is a fixed constant.

Remark 5.

The second stage of BSE in Algorithm 1 can be considered as the high-dimensional version of the univariate Wild Binary Segmentation (WBS) first introduced in Fryzlewicz (2014). We note that there are two other important contributions that also successfully extend WBS to high dimensional settings: the Sparsified Binary Segmentation algorithm (SBS) introduced by Cho and Fryzlewicz (2015) and the INSPECT algorithm by Wang and Samworth (2018). However, neither INSPECT nor SBS are designed for the change point regression setting and therefore may not be the solutions to our problem.

To be more precise, we first remark that while INSPECT is proven to be effective and efficient in the high-dimensional mean change point models, it is a non-trivial open problem to establish the consistency of INSPECT under our setting. The theoretical properties of INSPECT are heavily based on the usage of the sample splitting and the semidefinite programming. It is therefore unclear whether one can easily establish the same statistical guarantees using the estimated samples as the input of INSPECT.

Also, the SBS algorithm can be viewed as a hard-thresholding of the CUSUM vectors followed by an aggregation. Therefore it characterizes the change size using norm instead of norm we use in (10), which means that SBS necessarily assumes that

must be sufficiently large. We remark that if we use of to characterize the change size, then we would need to quantify the accuracy of the estimators using the mean-sup error:

In the sparse regression problems, such bounds typically require stronger assumptions than we need in Theorem 1.

3 Binary Segmentation through Lasso Estimators

In the previous section, we have shown that the BSE algorithm can accurately locate the change points with localization errors . However, as discussed in Section 6.1, the computational cost of the coefficient estimation in algorithm (8) is for some , which makes BSE infeasible when is large. To overcome this limitation, in this section we consider an alternative approach which is much more computationally efficient. To motivate such an approach, observe that the multidimensional CUSUM statistics in (4) can be rewritten as

(17)

Therefore, instead of estimating , we can alternatively consider directly estimating

for any interval that is used in the binary search procedure.

We remark that given any generic interval , which possibly contains multiple change points, the population quantity can also be thought of as the best linear predictor given the data on the time interval :

The best linear predictor in the high-dimensional misspecified models has been intensively studied by many authors before, including Bunea et al. (2007), Van De Geer et al. (2009) and references therein. As a result, given any generic interval , we will consider the estimator

(18)

where is a user-specified tuning parameter. In the following lemma, we show that well-approximates .

Lemma 1.

Suppose creftypecap 1 holds. Denote

(19)

Let be defined as in (18). Suppose in addition and . Then with probability at least ,

for some constant only depending on .

Remark 6.

We note that under creftypecap 1, it is easy to see that the parameter defined in (19) of Lemma 1 satisfies . However, if for example we assume that for some absolute constant , then

which implies that can be bounded by an absolute constant independent of and under this mild condition.

We are ready to describe the main algorithm in this section, which we call Binary Segmentation through Lasso Estimators (BSLE).

, , , , ,
for  do
     
     if  ,  then
         , where
         
     else
         
     end if
end for
if   then
     Add to the set of estimated change point
     Repeat BSLE on and on with tuning parameters , and
else Continue
end if
The set of estimated change points.
Algorithm 2 Binary Segmentation through Lasso Estimators. (BSLE)

With the additional parameter , the statistical consistency of Algorithm 2 can be established as follows.

Theorem 2.

Suppose creftypecap 1 holds. Let the input parameters , and in Algorithm 2 satisfy

(20)

for some sufficiently small constant and sufficient large constants . In addition suppose that event holds and that for an absolute constant . Then the estimators returned by BSLE with input parameters , and satisfies

(21)

where is some constant independent of and .

Remark 7.

In Algorithm 2, the tuning parameter is the threshold for the Binary Search algorithm, which is commonly found in much of the change point literature. is the standard tuning parameter to ensure the estimators are sparse when . The tuning parameter is needed to ensure that the restricted eigenvalue condition is valid for any interval used in binary search procedure. Similar requirements that are used to ensure the validity of restricted eigenvalue condition can also be found in Leonardi and Bühlmann (2016) and Kaul et al. (2018).

Remark 8.

As discuss in Remark 6, without any additional assumptions, . Therefore if in Theorem 2 we can further guarantee that , then the localization error bound for BSLE satisfies

where is the sup-norm error defined in (2). Therefore in the setting where we only have finitely many change points, the statistical error bound for BSLE is worse by a factor of compared to BSE.

Remark 9.

In Section 2.2 of their paper, Leonardi and Bühlmann (2016) proposed an algorithm, which is called Efficient Binary Segmentation Algorithm (EBSA) to detect the change points in regression models. We remark that our BSLE is inspired by and closely related to EBSA and it is clear that the computational cost of BSLE and EBSA are of the same order when is a bounded constant. While EBSA is proven to be efficient and effective in the high dimensional change point regression model, we remark that BSLE also has a few advantages. 1) The way we characterize the structural changes in (10) and (11) is more interpretable than that in Assumption 4555In addition to interpretability, the change size defined in Assumption 4 of Leonardi and Bühlmann (2016) will depend on , and , especially when we allow to diverge or . Therefore we choose make comparisons when is a fixed constant. of Leonardi and Bühlmann (2016). 2) As summarized in Table 1, the localization error bound of BSLE we can prove is shaper than the EBSA in Leonardi and Bühlmann (2016) by a factor of . We note that from our simulation study, we do observe that BSLE and EBSA have very similar empirical performance. Due these observations, we conjecture that the EBSA algorithm by Leonardi and Bühlmann (2016) can also achieve localization error bound of order , but we do not have a proof.

4 Local Screening

In this section, we consider a local screening algorithm that can potentially sharpen the localization error rate given a collection of change point estimators that are consistent but not necessarily optimal. Consider the following algorithm.

, , and .
for  do
     Let and .
     if  then
         Compute through
(22)
         , where
(23)
     end if