1 Introduction
Nowadays, nearly all fields of applications require sophisticated statistical modelling and statistical inference to draw scientific conclusions from the observed data.
In many cases data is time dependent and the involved model parameters or the model itself may not be necessarily stable. In such situations it is of particular importance to detect changes in the processed data as soon as possible and to adapt the statistical analysis accordingly.
These changes are usually called change points or structural breaks in the literature.
Due to its universality, methods for change point analysis have a vast field of possible applications  ranging from natural sciences [for example biology and meteorology] to humanities [economics, finance, social sciences].
Since the seminal papers of Page (1954, 1955) the problem of detecting change points in time series has received substantial attention in the statistical literature.
The contributions to this field can be roughly divided into the areas of retrospective and sequential change point analysis.
In the retrospective case, historical data sets are examined with the aim to test for changes and identify their position within the data.
In this setup, the data is assumed to be completely available before the statistical analysis is started (aposteriori analysis).
A comprehensive overview of retrospective change point analysis can be found in Aue and Horváth (2013).
In many practical applications, however, data arrives consecutively and breaks can occur at any new data point.
In such cases the statistical analysis for changes in the processed data has to start immediately with the target to detect changes as soon as possible.
This field of statistics is called sequential change point detection [sometimes also: online change point detection].
In the major part of the 20th century the problem of sequential change point detection was tackled using procedures [mostly called control charts, see Lai (1995, 2001)
for comprehensive reviews], which are optimized to have a minimal detection delay but do not control the probability of a false alarm (type I error). A new paradigm was then introduced by
Chu et al. (1996), who use initial data sets and therefrom employ invariance principles to also control the type I error. The methods developed under this paradigm [see below] can again be subdivided into closedend and openend approaches. In closedend scenarios monitoring is stopped at a fixed predefined point of time, while in openend scenarios monitoring can  in principle  continue forever if no change point is detected.In the paper at hand we develop a new approach for sequential change point detection in an openend scenario. To be more precise let denote a dimensional time series and let
be the distribution function of the random variable
at time . We are studying monitoring procedures for detecting changes of a parameter , where is a dimensional parameter of a distribution function on(such as the mean, variance, correlation etc.). In particular we will develop a decision rule for the hypothesis of a constant parameter, that is
(1.1) 
against the alternative that the parameter changes (once) at some time with , that is
(1.2) 
In this setup, which was originally introduced by Chu et al. (1996), the first observations are assumed to be stable and will serve as an initial training set. The problem of sequential change point detection in the hypotheses paradigm as pictured above has received substantial interest in the literature. Since the seminal paper of Chu et al. (1996) several authors have worked in this area. Aue et al. (2006), Aue et al. (2009), Fremdt (2014b) and Aue et al. (2014) developed methodology for detecting changes in the coefficients of a linear model, while Wied and Galeano (2013) and Pape et al. (2016) considered sequential monitoring schemes for changes in special functionals such as the correlation or variance. A Mosumapproach was employed by Leisch et al. (2000), Horváth et al. (2008) or Chen and Tian (2010) to monitor the mean and linear models, respectively. Recently, Hoga (2017) used an norm to detect changes in the mean and variance of a multivariate time series, Kirch and Weber (2018) defined a unifying framework for detecting changes in different parameters with the help of several statistics and Otto and Breitung (2019) considered a Backward CUSUM, which monitors changes based on recursive residuals in a linear model. A helpful but not exhaustive overview of different sequential procedures can be found in Section 1, in particular Table 1, of Anatolyev and Kosenok (2018). A common feature of all procedures in the cited literature consists in the comparison of estimators from different subsamples of the data. To be precise, let denote an initial training sample and be the available data at time . Several authors propose to investigate the differences
(1.3) 
(in dependence of ), where denotes the estimator of the parameter from the sample . In the sequential change point literature monitoring schemes based on the differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered by Horváth et al. (2004), Aue et al. (2006, 2009, 2014), Schmitz and Steinebach (2010) or Hoga (2017). Other authors suggest using a function of the differences
(1.4) 
(in dependence of ) and the corresponding procedures are usually called PageCUSUM tests [see Fremdt (2014b), Aue et al. (2015), or Kirch and Weber (2018) among others]. As an alternative we propose  following ideas of Dette and Gösmann (2018)  a monitoring scheme based on a function of the differences
(1.5) 
The intuitive advantage of (1.5) over (1.3) is the screening for all possible positions of the change point, which takes into account that the change point not necessarily comes with observation and so maybe ‘corrupted’ by prechange observations. This issue is also partially addressed by (1.4), where different positions are examined and compared with the estimator of the parameter from the training sample. We will demonstrate in Section 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial improvement in power compared to the commonly used methods based on (1.3) and (1.4). To avoid misunderstandings, the reader should note that a (total) comparison based on differences of the form (1.5), is typically also called a CUSUMapproach in the retrospective change point analysis [see Aue and Horváth (2013) for a comprehensive overview of (retrospective) change point analysis].
The present paper is devoted to a rigorous statistical analysis of a sequential monitoring based on the differences defined in (1.5) in the context of an openend scenario. In Section 2 we introduce the new procedure and develop a corresponding asymptotic theory to obtain critical values such that monitoring can be performed at a controlled type I error. The theory is broadly applicable to detect changes in a general parameter of a multivariate time series. As all monitoring schemes in this context the method depends on a threshold function and we also discuss the choice of this function. In particular we establish an interesting result regarding this choice and establish a connection to corresponding ideas made by Horváth et al. (2004) and Fremdt (2014b), which may also be of interest in closedend scenarios. In Section 3 we discuss several special cases and demonstrate that the new methodology is applicable to detect changes in the mean and the parameters of a linear model. Finally, we present a small simulation study in Section 4, where we compare our approach to those developed by Horváth et al. (2004) and Fremdt (2014b). In particular we demonstrate that the monitoring scheme based on the differences (1.5
) yields a test with a controlled type I error and a smaller type II error than the procedures in the cited references.
2 Asymptotic properties
Throughout this paper let denote a dimensional distribution function and a dimensional parameter of . We will denote by
(2.1) 
the empirical distribution function of observations (here the inequality is understood componentwise) and consider the canonical estimator of the parameter from the sample .
To test the hypotheses (1.1) and (1.2) in the described online setting in a openend scenario we propose a monitoring scheme defined by
(2.2) 
where the statistic denotes an estimator of the longrun variance matrix (defined in Assumption 2.2) and the symbol
denotes a weighted norm of the vector
induced by the positive definite matrix . The monitoring is then performed as follows. With observation arriving, one computes and compares it to an appropriate threshold function, which is also called weighting function in the literature, say . If(2.3) 
monitoring is stopped and the null hypothesis (
1.1) is rejected in favor of the alternative (1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observation . We will derive the limiting distribution of in Theorem 2.6 below to determine the constant involved in (2.3), such that the test keeps a nominal level of (asymptotically as ).Remark 2.1
The statistic (2.2) is related to a detection scheme, which was recently proposed by Dette and Gösmann (2018) for the closedend case, where monitoring ends with observation , for some . These authors considered the statistic
(2.4) 
and showed
(2.5) 
where denotes a dimensional Brownian motion and throughout this paper the symbol denotes weak convergence (in the space under consideration). However, this statistic cannot be considered in an openend scenario for the typical threshold functions considered in the literature satisfying (in this case the limit on the righthand side of (2.5) would be almost surely infinite for ). As threshold functions satisfying will cause a loss in power as demonstrated in an unpublished simulation study, we propose to replace the factor in (2.4) by the size of the initial sample , which leads to the monitoring scheme defined by (2.2).
To discuss the asymptotic properties of our approach, we require the following notation. The symbol denotes convergence in probability. The process will usually represent a standard dimensional Brownian motion. For a vector , we denote by its Euclidean norm. For the sake of a clear distinction we will employ
for discrete indexing (with integer arguments) and
for continuous indexing (with arguments taken from the interval or another subset of ).
Next, we define the influence function (assuming its existence) by
(2.6) 
where is the distribution function of the Dirac measure at the point and the inequality in the indicator is again understood componentwise. We will focus on functionals that allow for an asymptotic linearization in terms of the influence function, that is
(2.7) 
with asymptotically negligible remainder terms . Finally, for the sake of readability we introduce the following abbreviation
where is again the distribution function of . Under the null hypothesis (1.1) we will impose the following assumptions on the underlying time series.
Assumption 2.2 (Approximation)
The time series is (strictly) stationary, such that for all . Further, for each there exist two independent, dimensional standard Brownian motions and , such that for some positive constant the following approximations hold
(2.8) 
and
(2.9) 
as , where denotes the longrun variance matrix of the process , which we assume to exist and to be nonsingular.
Assumption 2.3 (Threshold function)
The threshold function is uniformly continuous, has a positive lower bound, say , and satisfies
Assumption 2.4 (Linearization)
Remark 2.5
Let us give a brief explanation on the assumption stated above.

Assumption 2.2 is a uniform invariance principle and frequently used in the (sequential) change point literature [see for example Aue et al. (2006) or Fremdt (2014b) among others]. In the onedimensional case, Assumption (2.8) was verified by Aue and Horváth (2004) for different classes of time series including GARCH or strongly mixing processes and can be easily extended to the multivariate case considered here. Assumption 2.2
is stronger than a functional central limit theorem, which is usually sufficient to work in a closedend setup [see for example
Wied and Galeano (2013), Pape et al. (2016) or Dette and Gösmann (2018)] 
Assumption 2.3 gives necessary restrictions on the feasible set of threshold functions, which are required for the existence of a (weak) limit derived in Theorem 2.6. It is also worth mentioning that the assumption of a lower bounded threshold can be relaxed to
for a constant . In this case, the assumption for the remainders in (2.10) has to be replaced by
For the sake of a transparent presentation we use the assumption of a lower bound here as this also simplifies the technical arguments in the proofs later on.

Assumption 2.4 is crucial for the proof of our main theorem and directly implies
Note that in the location model we have and (2.10) obviously holds. In general however, Assumption 2.4 is highly nontrivial and crucially depends on the structure of the functional and the time series . For a comprehensive discussion the reader is referred to Dette and Gösmann (2018), where the estimate (2.10
) has been verified in probability for different functionals including quantiles and variance.
The following result is the main theorem of this section.
Theorem 2.6
For the sake of completeness, the reader should note that due to Assumption 2.3 the asymptotic behaviour of the threshold guarantees that the random variable on the righthand side of (2.11) is finite (with probability one).
In light of Theorem 2.6 one can choose a constant , such that
(2.12) 
The following corollary then states that our approach leads to a level detection scheme.
The limit distribution obtained in Theorem 2.6 strongly depends on the considered threshold. A special family of thresholds that has received considerable attention in the literature [see Horváth et al. (2004), Fremdt (2014b), Kirch and Weber (2018) among many others] is given by
(2.13) 
where the cutoff can be chosen arbitrary small in applications and only serves to reduce the assumptions and technical arguments in the proof [see also Wied and Galeano (2013)]. With these functions the limit distribution in (2.11), with the threshold function as the denominator, can be simplified to an expression that is more easily tractable via simulations. Straightforward calculations yield that Assumption 2.3 is satisfied by the function and the limit distribution in Theorem 2.6 simplifies as follows.
Corollary 2.8
For a dimensional Brownian motion with independent components it holds that
For the investigation of the consistency of the monitoring scheme (2.2) we require the following assumption.
Assumption 2.9
Remark 2.10
The assumptions stated above are substantially weaker than those used to investigate the asymptotic properties of under the null hypothesis. Basically, we only assume reasonable behavior of the time series before and after the change point and can drop the uniform approximation in Assumption 2.2 and the uniform negligibility of the remainders in Assumption 2.4. It is easy to see, that (2.14) is already satisfied if both, the phases before and after the change fulfill a central limit theorem. Finally, it is worth mentioning that one can also derive the subsequent results when replacing the by for an arbitrary constant , however  for the sake of better readability  we will work with this (minimally stricter) assumption.
The next Theorem yields consistency under the alternative hypothesis.
3 Some specific change point problems
In this section we briefly illustrate how the theory developed in Section 2 can be employed to construct monitoring schemes for a specific parameter of the distribution function. For the sake of brevity we restrict ourselves to the mean and the parameters in a linear model. Other examples such as the variance or quantiles can be found in Dette and Gösmann (2018).
3.1 Changes in the mean
The sequential detection of changes in the mean
has been extensively discussed in the literature [see Aue and Horváth (2004), Fremdt (2014b) or Hoga (2017) among many others].
Is is easy to verify (and well known), that the influence function for the mean is given by
and Assumption 2.4 and the second part of Assumption 2.9 are obviously satisfied in this case since we have for all . For the remaining assumptions in Section 2 it now suffices that the centered time series fulfills Assumption 2.2, which also implies the remaining part of Assumption 2.9 [see also the discussion in Remark 2.5]. In this situation both, Theorem 2.6 and Theorem 2.11 are valid provided that the chosen threshold fulfills Assumption 2.3.
3.2 Changes in linear models
Consider the timedependent linear model
(3.1) 
where the random variables are the valued predictors, is a dimensional parameter and is a centered random sequence independent of .
The identification of changes in the vector of parameters in the linear model represents the prototype problem in sequential change point detection as it has been extensively studied in the literature [see Chu et al. (1996), Horváth et al. (2004), Aue et al. (2009), Fremdt (2014b), among many others].
This situation is covered by the general theory developed in Section 2 and 3.
To be precise let
(3.2) 
be the the joint vectors of predictor and response with (joint) distribution function
, such that the marginal distributions of and are given byrespectively, where we will assume that the predictor sequence is stationary, that is
. In a first step we will consider the case, where the moment matrix
is known (we will discuss later on why this assumption is nonrestrictive) and nonsingular. In this setup, the parameter can be represented as a functional of the distribution function , that is
which leads to the estimators
(3.3) 
from the sample . To compute the influence function, let , then
which is the influence function (for ) in the linear model stated above [see for example Hampel et al. (1986) for a comprehensive discussion of influence functions]. In the following, we will use the notation again. Note that
(3.4) 
which directly gives . Under the null hypothesis the random sequence is stationary and the linearization defined in (2.7) simplifies to
(3.5) 
Consequently, the remainders in (2.7) vanish and Assumption 2.4 is obviously satisfied. Next, note that the longrun variance matrix is given by
(3.6) 
with , which can be estimated by where is an estimator for . Observing (3.5) it is now easy to see that in the resulting statistic the matrix cancels out, that is
(3.7) 
and therefore does not depend on the matrix . We therefore obtain the following result, which describes the asymptotic properties of the monitoring scheme based on the statistic
for a change in the parameter in the linear regression model (
3.1). The proof is a direct consequence of Theorem 2.6 and 2.11.Corollary 3.1
Assume that the predictor sequence is strictly stationary with a nonsingular second moment matrix . Let denote a nonsingular, consistent estimator of the nonsingular longrun variance matrix defined in (4.8). Further suppose that the sequences and are independent and let the threshold function under consideration fulfill Assumption 2.3.
4 Finite sample properties
In this section we investigate the finite sample properties of our monitoring procedure and demonstrate its superiority with respect to the available methodology. We choose the following two statistics as our benchmark
(4.1) 
The procedure based on was originally proposed by Horváth et al. (2004) for detecting changes in the parameters of linear models and since then reconsidered for example by Aue et al. (2012), Wied and Galeano (2013) and Pape et al. (2016) for the detection of changes in the CAPMmodel, correlation and variances, respectively.
A statistic of the type was recently proposed by Fremdt (2014b) and has been already reconsidered by Kirch and Weber (2018).
In the simulation study we will restrict ourselves to the commonly used class of threshold functions defined in (2.13), where we set the involved, technical constant .
Under the assumptions made in Section 2, it can be shown by similar arguments as given in Appendix A that
(4.2) 
and
(4.3) 
where denotes a dimensional Brownian motion.
For detailed proofs (under slightly different assumptions) of (4.2) and (4.3), the reader is relegated to Horváth et al. (2004) and Fremdt (2014b), where procedures of these types are considered in the special case of a linear model.
Recall the notation of introduced in Corollary 2.8.
By (4.2), (4.3) and Corollary 2.7 the necessary critical values for the procedures , and combined with threshold are given as the quantiles of the distributions , and , respectively and can easily be obtained by Monte Carlo simulations.
The quantiles are listed in Table 1 for dimensions and and have been calculated by
runs simulating the corresponding distributions where the underlying Brownian motions have been approximated on a grid of points.
In Sections 4.1 and 4.2 below, we will examine the finite sample properties of the three statistics for the detection of changes in the mean and in the regression coefficients of a linear model, respectively.
All subsequent results presented in these sections are based on 1000 independent simulation runs and a fixed test level of .
p  \  0.01  0.05  0.1  0.01  0.05  0.1  0.01  0.05  0.1 

1  0  2.9762  2.4721  2.2175  2.8262  2.2599  1.9914  2.7912  2.2365  1.9497 
0.25  3.1050  2.5975  2.3542  2.9638  2.4296  2.1758  2.9445  2.3860  2.1060  
0.45  3.4269  2.9701  2.7398  3.3817  2.9241  2.7002  3.3015  2.7992  2.5437  
2  0  3.4022  2.8943  2.6562  3.2272  2.6794  2.4008  3.2461  2.6957  2.4266 
0.25  3.5279  3.0948  2.7781  3.3322  2.7981  2.5481  3.3630  2.8433  2.5911  
0.45  3.8502  3.3912  3.1509  3.7010  3.2046  2.9543  3.7467  3.2966  3.0620 
4.1 Changes in the mean
In this section we will compare the finite sample properties of the procedures based on the statistics and for changes in the mean as outlaid in Section 3.1. Here we test the null hypothesis of no change which is given by
(4.4) 
while the alternative, that the parameter changes beyond the initial data set, is defined as
(4.5) 
We will consider two different data generating models, a white noise process and an autoregressive process given by

i.i.d. ,

with i.i.d. .
For the AR(1)process specified in model (M2), we create a burnin sample of 100 observations in the first place. To simulate the alternative hypotheses, changes in the mean are added to the data, that is
where denotes the desired change amount.
For the necessary covariance estimation we employ the well known quadratic spectral estimator [see (Andrews, 1991)] with its implementation in the Rpackage ‘sandwich’ [see Zeileis (2004)].
To take into account the possible appearance of changes only the initial stable segment is used for this estimate, which is standard in the literature [see for example Horváth et al. (2004), Wied and Galeano (2013), or Dette and Gösmann (2018) among many others].
In Table 2 we display the type 1Error for both time series models and different choices of in the threshold functions.
The principle observation is, that all three statistical procedures offer a reasonable approximation of the desired nominal level of .
The results for the dependent model (M2) are slightly worse than those for the white noise model (M1).
This effect may be caused by a less precise estimation of the longrun variance for small sample sizes.
Accordingly, this effect is weaker for the case .
In Figures 1, 2, 3 and 4 we illustrate the power of the procedures under the alternative hypothesis for increasing values of the change and different change positions for combinations of , and , .
The basic tendency in all four plots is similar: While the procedures behave similar for a change close to the initial data set (first row), the method based on is clearly superior to the others the more the distance to the initial set grows.
To give an example, consider the left plot of the last row in Figure 1.
Here the test based on the statistic already has a power of 32.9% for a change of whereas the tests based on the statistics and have power of 24.4% and 22.7%, respectively.
The superior performance of can most likely be explained by the more accurate estimate of the prechange parameter by , while the the other statistics only involve the estimator [see formulas (2.2) and (4.1)].
For the sake of an appropriate understanding of our findings, the reader should be aware of the fact, that  although we consider openend procedures here  simulations have to be stopped eventually.
Here we chose this stopping point as ( or () observations and it is expectable that the testing power of all procedures increases with a later stopping point.
Therefore the observed superiority of refers to the type 2Error until the specified stopping point.
(M1)  (M2)  
50  5.4%  5.2%  5.5%  8.1%  7.1%  8.2%  
5.0%  4.9%  5.4%  8.3%  7.0%  9.5%  
4.5%  3.6%  4.8%  7.6%  5.6%  9.2%  
100  4.2%  4.3%  4.9%  6.9%  6.5%  6.9%  
5.0%  4.9%  5.9%  7.6%  6.5%  7.0%  
6.0%  4.9%  7.0%  6.5%  4.8%  7.7% 
(M1)  (M2)  








(M1)  (M2)  








(M1)  (M2)  






(M1)  (M2)  






4.2 Changes in linear models
In this section we present some simulation results for the detection of changes in the linear model (3.1). We aim to detect changes in the unknown parameter vector by testing the null hypothesis
(4.6) 
against the alternative that the parameter changes beyond the initial data set, that is
(4.7) 
To be precise, we consider the model (3.1) with and the following choice of predictors

,

with and ,
where denotes an i.i.d. sequence of random variables in both models. The parameter vector is fixed at under the null hypothesis and to examine the alternative hypothesis, changes are added to its second component, that is
For both scenarios we simulated the residuals in model (3.1) as i.i.d. sequences. Note that the models specified above have been already considered by Fremdt (2014b). As pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our procedures is given by
(4.8) 
We estimate this quantity based on the stable segment of observations using the well known quadratic spectral estimator [see Andrews (1991)] with its implementation in the Rpackage ‘sandwich’ [see Zeileis (2004)].
The problem of detecting changes in the parameter of the linear model has also been addressed using partial sums of the residuals
in statistics similar to (4.1), where is an initial estimate of computed from the initial stable segment [see for example Chu et al. (1996), Horváth et al. (2004), Fremdt (2014a) among many others].
Our approach directly compares estimators for the vector , which are derived using the general methodology introduced in Section 2 and 3.
The resulting statistics are obtained replacing by in equation (4.1).
We also refer to Leisch et al. (2000) for a comparison of residual based methods with methods using the estimators directly (these authors consider a statistic similar to ).
In Table 3 we display the approximation of the nominal level for the three statistics with different values of the parameter in the threshold function,
where monitoring was stopped after observations.
We observe a reasonable approximation of the nominal level 5% in the case , while the rejection probabilities for for or slightly exceed the desired level of 5%.
The fact that larger values of can lead to a worse approximation of the desired type 1Error has also been observed by other authors [see, for example, Wied and Galeano (2013)] and can be explained by a more sensitive threshold function at the monitoring start if is chosen close to .
Overall, the approximation is slightly better for the independent case in model (LM1).
In Figure 5 we compare the power with respect to the change amount for different change positions, where we restrict ourselves to the case for the sake of brevity.
The results are very similar to those provided for the mean functional in Section 4.1.
Again the monitoring scheme based on outperforms the procedures based on and
, and the superiority is larger for a later change. We omit a detailed discussion and summarize that the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring scheme based on the statistic
.(LM1)  (LM2)  

6.4%  6.5%  6.7%  7.2%  6.7%  7.2%  
7.6%  8.8%  9.1%  8.5%  9.6%  9.5%  
12.0%  12.2%  12.1%  12.6%  12.2%  12.6% 
(LM1)  (LM2)  






5 Closedend scenarios
It is worthwhile to mention that the theory developed so far also covers the case of closedend scenarios [sometimes also called finite time horizon in the literature]. In this section, we will very briefly discuss this situation and present a small batch of simulation results, which also indicate the superiority of the statistic for closedend scenarios. Note that the null hypothesis in this setup is given by
(5.1) 
which is tested against the alternative that the parameters changes (once) at some time , that is
(5.2) 
Here the factor controls the length of the monitoring period compared to the size of the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding statement of Theorem 2.6 and Corollary 2.8.
Comments
There are no comments yet.