# A new approach for open-end sequential change point monitoring

We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level α procedure, which is consistent under the alternative of a change in the parameter.

## Authors

• 4 publications
• 3 publications
• 54 publications
02/21/2018

### A likelihood ratio approach to sequential change point detection

In this paper we propose a new approach for sequential monitoring of a p...
07/27/2020

### Poisson QMLE for change-point detection in general integer-valued time series models

We consider together the retrospective and the sequential change-point d...
04/26/2020

### Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions

The aim of sequential change-point detection is to issue an alarm when i...
07/16/2020

### Open-end nonparametric sequential change-point detection based on the retrospective CUSUM statistic

The aim of online monitoring is to issue an alarm as soon as there is si...
04/06/2020

### A novel change point approach for the detection of gas emission sources using remotely contained concentration data

Motivated by an example from remote sensing of gas emission sources, we ...
03/26/2020

### Sequential monitoring for cointegrating regressions

We develop monitoring procedures for cointegrating regressions, testing ...
03/28/2018

### Generalized Laplace Inference in Multiple Change-Points Models

Under the classical long-span asymptotic framework we develop a class of...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Nowadays, nearly all fields of applications require sophisticated statistical modelling and statistical inference to draw scientific conclusions from the observed data. In many cases data is time dependent and the involved model parameters or the model itself may not be necessarily stable. In such situations it is of particular importance to detect changes in the processed data as soon as possible and to adapt the statistical analysis accordingly. These changes are usually called change points or structural breaks in the literature. Due to its universality, methods for change point analysis have a vast field of possible applications - ranging from natural sciences [for example biology and meteorology] to humanities [economics, finance, social sciences]. Since the seminal papers of Page (1954, 1955) the problem of detecting change points in time series has received substantial attention in the statistical literature. The contributions to this field can be roughly divided into the areas of retrospective and sequential change point analysis.
In the retrospective case, historical data sets are examined with the aim to test for changes and identify their position within the data. In this setup, the data is assumed to be completely available before the statistical analysis is started (a-posteriori analysis). A comprehensive overview of retrospective change point analysis can be found in Aue and Horváth (2013). In many practical applications, however, data arrives consecutively and breaks can occur at any new data point. In such cases the statistical analysis for changes in the processed data has to start immediately with the target to detect changes as soon as possible. This field of statistics is called sequential change point detection [sometimes also: online change point detection].
In the major part of the 20th century the problem of sequential change point detection was tackled using procedures [mostly called control charts, see Lai (1995, 2001)

for comprehensive reviews], which are optimized to have a minimal detection delay but do not control the probability of a false alarm (type I error). A new paradigm was then introduced by

Chu et al. (1996), who use initial data sets and therefrom employ invariance principles to also control the type I error. The methods developed under this paradigm [see below] can again be subdivided into closed-end and open-end approaches. In closed-end scenarios monitoring is stopped at a fixed pre-defined point of time, while in open-end scenarios monitoring can - in principle - continue forever if no change point is detected.

In the paper at hand we develop a new approach for sequential change point detection in an open-end scenario. To be more precise let denote a -dimensional time series and let

be the distribution function of the random variable

at time . We are studying monitoring procedures for detecting changes of a parameter , where is a -dimensional parameter of a distribution function on

(such as the mean, variance, correlation etc.). In particular we will develop a decision rule for the hypothesis of a constant parameter, that is

 H0 :θ1=⋯=θm=θm+1=θm+2=… , (1.1)

against the alternative that the parameter changes (once) at some time with , that is

 H1 :∃k⋆∈N:θ1=⋯=θm+k⋆−1≠θm+k⋆=θm+k⋆+1=…. (1.2)

In this setup, which was originally introduced by Chu et al. (1996), the first observations are assumed to be stable and will serve as an initial training set. The problem of sequential change point detection in the hypotheses paradigm as pictured above has received substantial interest in the literature. Since the seminal paper of Chu et al. (1996) several authors have worked in this area. Aue et al. (2006), Aue et al. (2009), Fremdt (2014b) and Aue et al. (2014) developed methodology for detecting changes in the coefficients of a linear model, while Wied and Galeano (2013) and Pape et al. (2016) considered sequential monitoring schemes for changes in special functionals such as the correlation or variance. A Mosum-approach was employed by Leisch et al. (2000), Horváth et al. (2008) or Chen and Tian (2010) to monitor the mean and linear models, respectively. Recently, Hoga (2017) used an -norm to detect changes in the mean and variance of a multivariate time series, Kirch and Weber (2018) defined a unifying framework for detecting changes in different parameters with the help of several statistics and Otto and Breitung (2019) considered a Backward CUSUM, which monitors changes based on recursive residuals in a linear model. A helpful but not exhaustive overview of different sequential procedures can be found in Section 1, in particular Table 1, of Anatolyev and Kosenok (2018). A common feature of all procedures in the cited literature consists in the comparison of estimators from different subsamples of the data. To be precise, let denote an initial training sample and be the available data at time . Several authors propose to investigate the differences

 ^θm1−^θm+km+1 , (1.3)

(in dependence of ), where denotes the estimator of the parameter from the sample . In the sequential change point literature monitoring schemes based on the differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered by Horváth et al. (2004), Aue et al. (2006, 2009, 2014), Schmitz and Steinebach (2010) or Hoga (2017). Other authors suggest using a function of the differences

 {^θm1−^θm+km+j+1}j=0,…,k−1 (1.4)

(in dependence of ) and the corresponding procedures are usually called Page-CUSUM tests [see Fremdt (2014b), Aue et al. (2015), or Kirch and Weber (2018) among others]. As an alternative we propose - following ideas of Dette and Gösmann (2018) - a monitoring scheme based on a function of the differences

 {^θm+j1−^θm+km+j+1}j=0,…,k−1. (1.5)

The intuitive advantage of (1.5) over (1.3) is the screening for all possible positions of the change point, which takes into account that the change point not necessarily comes with observation and so maybe ‘corrupted’ by pre-change observations. This issue is also partially addressed by (1.4), where different positions are examined and compared with the estimator of the parameter from the training sample. We will demonstrate in Section 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial improvement in power compared to the commonly used methods based on (1.3) and (1.4). To avoid misunderstandings, the reader should note that a (total) comparison based on differences of the form (1.5), is typically also called a CUSUM-approach in the retrospective change point analysis [see Aue and Horváth (2013) for a comprehensive overview of (retrospective) change point analysis].

The present paper is devoted to a rigorous statistical analysis of a sequential monitoring based on the differences defined in (1.5) in the context of an open-end scenario. In Section 2 we introduce the new procedure and develop a corresponding asymptotic theory to obtain critical values such that monitoring can be performed at a controlled type I error. The theory is broadly applicable to detect changes in a general parameter of a multivariate time series. As all monitoring schemes in this context the method depends on a threshold function and we also discuss the choice of this function. In particular we establish an interesting result regarding this choice and establish a connection to corresponding ideas made by Horváth et al. (2004) and Fremdt (2014b), which may also be of interest in closed-end scenarios. In Section 3 we discuss several special cases and demonstrate that the new methodology is applicable to detect changes in the mean and the parameters of a linear model. Finally, we present a small simulation study in Section 4, where we compare our approach to those developed by Horváth et al. (2004) and Fremdt (2014b). In particular we demonstrate that the monitoring scheme based on the differences (1.5

) yields a test with a controlled type I error and a smaller type II error than the procedures in the cited references.

## 2 Asymptotic properties

Throughout this paper let denote a -dimensional distribution function and a -dimensional parameter of . We will denote by

 ^Fji(z)=1j−i+1j∑t=iI{Xt≤z} (2.1)

the empirical distribution function of observations (here the inequality is understood component-wise) and consider the canonical estimator of the parameter from the sample .

To test the hypotheses (1.1) and (1.2) in the described online setting in a open-end scenario we propose a monitoring scheme defined by

 ^Em(k)=m−1/2k−1maxj=0(k−j)∥∥^θm+j1−^θm+km+j+1∥∥^Σ−1 , (2.2)

where the statistic denotes an estimator of the long-run variance matrix (defined in Assumption 2.2) and the symbol

denotes a weighted norm of the vector

induced by the positive definite matrix . The monitoring is then performed as follows. With observation arriving, one computes and compares it to an appropriate threshold function, which is also called weighting function in the literature, say . If

 ^Em(k)>cαw(k/m) (2.3)

monitoring is stopped and the null hypothesis (

1.1) is rejected in favor of the alternative (1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observation . We will derive the limiting distribution of in Theorem 2.6 below to determine the constant involved in (2.3), such that the test keeps a nominal level of (asymptotically as ).

###### Remark 2.1

The statistic (2.2) is related to a detection scheme, which was recently proposed by Dette and Gösmann (2018) for the closed-end case, where monitoring ends with observation , for some . These authors considered the statistic

 ^Dm(k)=m−3/2k−1maxj=0(m+j)(k−j)∥^θm+j1−^θm+km+j+1∥^Σ−1 , (2.4)

and showed

 mTmaxk=1^Dm(k)w(k/m)D⟹maxt∈[0,T]maxs∈[0,T]|(s+1)W(t+1)−(t+1)W(s+1)|w(t) , (2.5)

where denotes a -dimensional Brownian motion and throughout this paper the symbol denotes weak convergence (in the space under consideration). However, this statistic cannot be considered in an open-end scenario for the typical threshold functions considered in the literature satisfying (in this case the limit on the right-hand side of (2.5) would be almost surely infinite for ). As threshold functions satisfying will cause a loss in power as demonstrated in an unpublished simulation study, we propose to replace the factor in (2.4) by the size of the initial sample , which leads to the monitoring scheme defined by (2.2).

To discuss the asymptotic properties of our approach, we require the following notation. The symbol denotes convergence in probability. The process will usually represent a standard -dimensional Brownian motion. For a vector , we denote by its Euclidean norm. For the sake of a clear distinction we will employ

 nsupi=1a(i)

for discrete indexing (with integer arguments) and

 sup0≤x≤1a(x)

for continuous indexing (with arguments taken from the interval or another subset of ).
Next, we define the influence function (assuming its existence) by

 IF(x,F,θ)=limε↘0θ((1−ε)F+εδx)−θ(F)ε , (2.6)

where is the distribution function of the Dirac measure at the point and the inequality in the indicator is again understood component-wise. We will focus on functionals that allow for an asymptotic linearization in terms of the influence function, that is

 ^θji−θ=θ(^Fji)−θ(F)=1j−i+1j∑t=iIF(Xt,F,θ)+Ri,j (2.7)

with asymptotically negligible remainder terms . Finally, for the sake of readability we introduce the following abbreviation

 IFt=IF(Xt,Ft,θ) ,

where is again the distribution function of . Under the null hypothesis (1.1) we will impose the following assumptions on the underlying time series.

###### Assumption 2.2 (Approximation)

The time series is (strictly) stationary, such that for all . Further, for each there exist two independent, -dimensional standard Brownian motions and , such that for some positive constant the following approximations hold

 ∞supk=11kξ∣∣∣m+k∑t=m+1IFt−√ΣWm,1(k)∣∣∣=OP(1) (2.8)

and

 1mξ∣∣∣m∑t=1IFt−√ΣWm,2(k)∣∣∣=OP(1) (2.9)

as , where denotes the long-run variance matrix of the process , which we assume to exist and to be non-singular.

###### Assumption 2.3 (Threshold function)

The threshold function is uniformly continuous, has a positive lower bound, say , and satisfies

 limsupt→∞tw(t)<∞ .
###### Assumption 2.4 (Linearization)

The remainder terms in the linearization (2.7) satisfy

 kmaxi,j=1i

as with probability one.

###### Remark 2.5

Let us give a brief explanation on the assumption stated above.

1. Assumption 2.2 is a uniform invariance principle and frequently used in the (sequential) change point literature [see for example Aue et al. (2006) or Fremdt (2014b) among others]. In the one-dimensional case, Assumption (2.8) was verified by Aue and Horváth (2004) for different classes of time series including GARCH or strongly mixing processes and can be easily extended to the multivariate case considered here. Assumption 2.2

is stronger than a functional central limit theorem, which is usually sufficient to work in a closed-end setup [see for example

Wied and Galeano (2013), Pape et al. (2016) or Dette and Gösmann (2018)]

2. Assumption 2.3 gives necessary restrictions on the feasible set of threshold functions, which are required for the existence of a (weak) limit derived in Theorem 2.6. It is also worth mentioning that the assumption of a lower bounded threshold can be relaxed to

 limt→0tγw(t)=0

for a constant . In this case, the assumption for the remainders in (2.10) has to be replaced by

 m+kmax1=i,ji

For the sake of a transparent presentation we use the assumption of a lower bound here as this also simplifies the technical arguments in the proofs later on.

3. Assumption 2.4 is crucial for the proof of our main theorem and directly implies

 ∞supk=1m+kmax1=i,ji

Note that in the location model we have and (2.10) obviously holds. In general however, Assumption 2.4 is highly non-trivial and crucially depends on the structure of the functional and the time series . For a comprehensive discussion the reader is referred to Dette and Gösmann (2018), where the estimate (2.10

) has been verified in probability for different functionals including quantiles and variance.

The following result is the main theorem of this section.

###### Theorem 2.6

Assume that the null hypothesis (1.1) and Assumptions 2.2 - 2.4 hold. If further is a consistent and non-singular estimator of the long-run variance matrix it holds that

 (2.11)

where is a -dimensional Brownian motion with independent components.

For the sake of completeness, the reader should note that due to Assumption 2.3 the asymptotic behaviour of the threshold guarantees that the random variable on the right-hand side of (2.11) is finite (with probability one).

In light of Theorem 2.6 one can choose a constant , such that

 P(sup0≤t<∞max0≤s≤tt+1w(t)∣∣W(ss+1)−W(tt+1)∣∣>c(α))≤α . (2.12)

The following corollary then states that our approach leads to a level detection scheme.

###### Corollary 2.7

Grant the Assumptions of Theorem 2.6 and further let satisfy inequality (2.12), then

 limsupm→∞P(∞supk=1^Em(k)w(k/m)>c(α))≤α .

The limit distribution obtained in Theorem 2.6 strongly depends on the considered threshold. A special family of thresholds that has received considerable attention in the literature [see Horváth et al. (2004), Fremdt (2014b), Kirch and Weber (2018) among many others] is given by

 wγ(t)=(1+t)max{(t1+t)γ,ε}with0≤γ<1/2 , (2.13)

where the cutoff can be chosen arbitrary small in applications and only serves to reduce the assumptions and technical arguments in the proof [see also Wied and Galeano (2013)]. With these functions the limit distribution in (2.11), with the threshold function as the denominator, can be simplified to an expression that is more easily tractable via simulations. Straightforward calculations yield that Assumption 2.3 is satisfied by the function and the limit distribution in Theorem 2.6 simplifies as follows.

###### Corollary 2.8

For a -dimensional Brownian motion with independent components it holds that

 sup0≤t<∞max0≤s≤tt+1wγ(t)∣∣W(ss+1)−W(tt+1)∣∣D=sup0≤t<1max0≤s≤t1max{tγ,ε}∣∣W(t)−W(s)∣∣:=L1,γ .

For the investigation of the consistency of the monitoring scheme (2.2) we require the following assumption.

###### Assumption 2.9

Under the alternative defined in (1.2) let

 θ(1):=θ(F1)=θ(F2)=⋯=θ(Fm+k∗)≠θ(2):=θ(Fm+k∗+1)=θ(Fm+k∗+1)=⋯

Further assume that is independent of and that the process is of the following order before and after the change, respectively,

 1√m∣∣∣m+k∗∑t=1IFt∣∣∣=OP(1)and1√m∣∣∣2m∑t=m+k∗+1IFt∣∣∣=OP(1) . (2.14)

Additionally assume that the remainders defined in (2.4) satisfy

 (2.15)
###### Remark 2.10

The assumptions stated above are substantially weaker than those used to investigate the asymptotic properties of under the null hypothesis. Basically, we only assume reasonable behavior of the time series before and after the change point and can drop the uniform approximation in Assumption 2.2 and the uniform negligibility of the remainders in Assumption 2.4. It is easy to see, that (2.14) is already satisfied if both, the phases before and after the change fulfill a central limit theorem. Finally, it is worth mentioning that one can also derive the subsequent results when replacing the by for an arbitrary constant , however - for the sake of better readability - we will work with this (minimally stricter) assumption.

The next Theorem yields consistency under the alternative hypothesis.

###### Theorem 2.11

Assume that the alternative hypothesis (1.2) and Assumptions 2.3 and 2.9 hold. If further is a consistent and non-singular estimator of the long-run variance matrix it holds that

Consequently,

 limm→∞P(∞supk=1^Em(k)w(k/m)>c)=1

for any constant .

## 3 Some specific change point problems

In this section we briefly illustrate how the theory developed in Section 2 can be employed to construct monitoring schemes for a specific parameter of the distribution function. For the sake of brevity we restrict ourselves to the mean and the parameters in a linear model. Other examples such as the variance or quantiles can be found in Dette and Gösmann (2018).

### 3.1 Changes in the mean

The sequential detection of changes in the mean

 μ(F)=EF[X]=∫RdxdF(x) .

has been extensively discussed in the literature [see Aue and Horváth (2004), Fremdt (2014b) or Hoga (2017) among many others].

Is is easy to verify (and well known), that the influence function for the mean is given by

 IF(x,F,μ)=x−EF[X] ,

and Assumption 2.4 and the second part of Assumption 2.9 are obviously satisfied in this case since we have for all . For the remaining assumptions in Section 2 it now suffices that the centered time series fulfills Assumption 2.2, which also implies the remaining part of Assumption 2.9 [see also the discussion in Remark 2.5]. In this situation both, Theorem 2.6 and Theorem 2.11 are valid provided that the chosen threshold fulfills Assumption 2.3.

### 3.2 Changes in linear models

Consider the time-dependent linear model

 Yt=P⊤tβt+εt , (3.1)

where the random variables are the -valued predictors, is a -dimensional parameter and is a centered random sequence independent of . The identification of changes in the vector of parameters in the linear model represents the prototype problem in sequential change point detection as it has been extensively studied in the literature [see Chu et al. (1996), Horváth et al. (2004), Aue et al. (2009), Fremdt (2014b), among many others].
This situation is covered by the general theory developed in Section 2 and 3. To be precise let

 Xt=(P⊤t,Yt)⊤∈Rd ,d=p+1andt=1,2… (3.2)

be the the joint vectors of predictor and response with (joint) distribution function

, such that the marginal distributions of and are given by

 Ft,Y=Ft(∞,…,∞,⋅)andFt,P=Ft(⋅,…,⋅,∞) ,

respectively, where we will assume that the predictor sequence is stationary, that is

. In a first step we will consider the case, where the moment matrix

 M:=E[P1P⊤1]=∫Rdp⋅p⊤dFP(p)

is known (we will discuss later on why this assumption is non-restrictive) and non-singular. In this setup, the parameter can be represented as a functional of the distribution function , that is

 βt=β(Ft):=M−1⋅∫Rdp⋅ydFt(y,p)=M−1⋅E[PtYt] ,

 ^βji=β(^Fji) =M−1j−i+1j∑t=iPtYt (3.3)

from the sample . To compute the influence function, let , then

 IF((p,y),Ft,β)=limη↘0β((1−η)Ft+εδ(p,y))−β(Ft)η=limη↘0M−1[(1−η)E[PtYt]+ηpyη−βtη]=M−1(py−E[PtYt]) ,

which is the influence function (for ) in the linear model stated above [see for example Hampel et al. (1986) for a comprehensive discussion of influence functions]. In the following, we will use the notation again. Note that

 IFt=M−1(PtYt−E[PtYt])=M−1PtYt−βt , (3.4)

which directly gives . Under the null hypothesis the random sequence is stationary and the linearization defined in (2.7) simplifies to

 ^βji−β1=β(^Fji)−β1=M−1j−i+1j∑t=iPtYt−β1=1j−i+1j∑t=i(M−1PtYt−βt)=1j−i+1j∑t=iIFt . (3.5)

Consequently, the remainders in (2.7) vanish and Assumption 2.4 is obviously satisfied. Next, note that the long-run variance matrix is given by

 Σ=∑t∈ZCov(IF0, IFt)=M−1ΓM−1 (3.6)

with , which can be estimated by where is an estimator for . Observing (3.5) it is now easy to see that in the resulting statistic the matrix cancels out, that is

 ^Em(k)=m−1/2k−1maxj=0(k−j)∥∥^βm+j1−^βm+km+j+1∥∥^Σ−1=m−1/2k−1maxj=0(k−j)∥∥1m+jm+j∑t=1YtPt−1k−jm+k∑t=m+j+1YtPt∥∥^Γ−1 (3.7)

and therefore does not depend on the matrix . We therefore obtain the following result, which describes the asymptotic properties of the monitoring scheme based on the statistic

for a change in the parameter in the linear regression model (

3.1). The proof is a direct consequence of Theorem 2.6 and 2.11.

###### Corollary 3.1

Assume that the predictor sequence is strictly stationary with a non-singular second moment matrix . Let denote a non-singular, consistent estimator of the non-singular long-run variance matrix defined in (4.8). Further suppose that the sequences and are independent and let the threshold function under consideration fulfill Assumption 2.3.

1. Under the null hypothesis of no change assume additionally that the sequence defined in (3.4) admits the approximation in Assumption 2.2. Then monitoring based on the statistic in (3.7) is an asymptotic level procedure.

2. Under the alternative hypothesis assume that fulfills (2.14) of Assumption 2.9. Then the monitoring based on the statistic in (3.7) is consistent.

## 4 Finite sample properties

In this section we investigate the finite sample properties of our monitoring procedure and demonstrate its superiority with respect to the available methodology. We choose the following two statistics as our benchmark

 ^Qm(k):=km1/2∥∥^θm1−^θm+km∥∥^Σ−1 ,^Pm(k):=k−1maxj=0k−jm1/2∥∥^θm1−^θm+km+j+1∥∥^Σ−1 . (4.1)

The procedure based on was originally proposed by Horváth et al. (2004) for detecting changes in the parameters of linear models and since then reconsidered for example by Aue et al. (2012), Wied and Galeano (2013) and Pape et al. (2016) for the detection of changes in the CAPM-model, correlation and variances, respectively. A statistic of the type was recently proposed by Fremdt (2014b) and has been already reconsidered by Kirch and Weber (2018). In the simulation study we will restrict ourselves to the commonly used class of threshold functions defined in (2.13), where we set the involved, technical constant .
Under the assumptions made in Section 2, it can be shown by similar arguments as given in Appendix A that

 ∞supk=1^Qm(k)wγ(k/m)D⟹sup0≤t<1|W(t)|max{tγ,ε}=:L2,γ (4.2)

and

 (4.3)

where denotes a -dimensional Brownian motion. For detailed proofs (under slightly different assumptions) of (4.2) and (4.3), the reader is relegated to Horváth et al. (2004) and Fremdt (2014b), where procedures of these types are considered in the special case of a linear model.
Recall the notation of introduced in Corollary 2.8. By (4.2), (4.3) and Corollary 2.7 the necessary critical values for the procedures , and combined with threshold are given as the -quantiles of the distributions , and , respectively and can easily be obtained by Monte Carlo simulations. The quantiles are listed in Table 1 for dimensions and and have been calculated by runs simulating the corresponding distributions where the underlying Brownian motions have been approximated on a grid of points. In Sections 4.1 and 4.2 below, we will examine the finite sample properties of the three statistics for the detection of changes in the mean and in the regression coefficients of a linear model, respectively. All subsequent results presented in these sections are based on 1000 independent simulation runs and a fixed test level of .

### 4.1 Changes in the mean

In this section we will compare the finite sample properties of the procedures based on the statistics and for changes in the mean as outlaid in Section 3.1. Here we test the null hypothesis of no change which is given by

 H0 :μ1=⋯=μm=μm+1=μm+2=… , (4.4)

while the alternative, that the parameter changes beyond the initial data set, is defined as

 H1 :∃k⋆∈N:μ1=⋯=μm+k⋆−1≠μm+k⋆=μm+k⋆+1=…. (4.5)

We will consider two different data generating models, a white noise process and an autoregressive process given by

1. i.i.d.  ,

2. with i.i.d.  .

For the AR(1)-process specified in model (M2), we create a burn-in sample of 100 observations in the first place. To simulate the alternative hypotheses, changes in the mean are added to the data, that is

 Xδt={Xtift

where denotes the desired change amount. For the necessary covariance estimation we employ the well known quadratic spectral estimator [see (Andrews, 1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis (2004)]. To take into account the possible appearance of changes only the initial stable segment is used for this estimate, which is standard in the literature [see for example Horváth et al. (2004), Wied and Galeano (2013), or Dette and Gösmann (2018) among many others].
In Table 2 we display the type 1-Error for both time series models and different choices of in the threshold functions. The principle observation is, that all three statistical procedures offer a reasonable approximation of the desired nominal level of . The results for the dependent model (M2) are slightly worse than those for the white noise model (M1). This effect may be caused by a less precise estimation of the long-run variance for small sample sizes. Accordingly, this effect is weaker for the case .
In Figures 1, 2, 3 and 4 we illustrate the power of the procedures under the alternative hypothesis for increasing values of the change and different change positions for combinations of , and , . The basic tendency in all four plots is similar: While the procedures behave similar for a change close to the initial data set (first row), the method based on is clearly superior to the others the more the distance to the initial set grows. To give an example, consider the left plot of the last row in Figure 1. Here the test based on the statistic already has a power of 32.9% for a change of whereas the tests based on the statistics and have power of 24.4% and 22.7%, respectively. The superior performance of can most likely be explained by the more accurate estimate of the pre-change parameter by , while the the other statistics only involve the estimator [see formulas (2.2) and (4.1)].
For the sake of an appropriate understanding of our findings, the reader should be aware of the fact, that - although we consider open-end procedures here - simulations have to be stopped eventually. Here we chose this stopping point as ( or () observations and it is expectable that the testing power of all procedures increases with a later stopping point. Therefore the observed superiority of refers to the type 2-Error until the specified stopping point.

### 4.2 Changes in linear models

In this section we present some simulation results for the detection of changes in the linear model (3.1). We aim to detect changes in the unknown parameter vector by testing the null hypothesis

 H0 :β1=⋯=βm=βm+1=βm+2=… , (4.6)

against the alternative that the parameter changes beyond the initial data set, that is

 H1 :∃k⋆∈N:β1=⋯=βm+k⋆−1≠βm+k⋆=βm+k⋆+1=…. (4.7)

To be precise, we consider the model (3.1) with and the following choice of predictors

1.  ,

2. with and  ,

where denotes an i.i.d. sequence of random variables in both models. The parameter vector is fixed at under the null hypothesis and to examine the alternative hypothesis, changes are added to its second component, that is

 βδt={(1,1)⊤ift

For both scenarios we simulated the residuals in model (3.1) as i.i.d. sequences. Note that the models specified above have been already considered by Fremdt (2014b). As pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our procedures is given by

 Γ=∑t∈ZCov(P0Y0, PtYt) . (4.8)

We estimate this quantity based on the stable segment of observations using the well known quadratic spectral estimator [see Andrews (1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis (2004)].

The problem of detecting changes in the parameter of the linear model has also been addressed using partial sums of the residuals in statistics similar to (4.1), where is an initial estimate of computed from the initial stable segment [see for example Chu et al. (1996), Horváth et al. (2004), Fremdt (2014a) among many others]. Our approach directly compares estimators for the vector , which are derived using the general methodology introduced in Section 2 and 3. The resulting statistics are obtained replacing by in equation (4.1). We also refer to Leisch et al. (2000) for a comparison of residual based methods with methods using the estimators directly (these authors consider a statistic similar to ).
In Table 3 we display the approximation of the nominal level for the three statistics with different values of the parameter in the threshold function, where monitoring was stopped after observations. We observe a reasonable approximation of the nominal level 5% in the case , while the rejection probabilities for for or slightly exceed the desired level of 5%. The fact that larger values of can lead to a worse approximation of the desired type 1-Error has also been observed by other authors [see, for example, Wied and Galeano (2013)] and can be explained by a more sensitive threshold function at the monitoring start if is chosen close to . Overall, the approximation is slightly better for the independent case in model (LM1).
In Figure 5 we compare the power with respect to the change amount for different change positions, where we restrict ourselves to the case for the sake of brevity. The results are very similar to those provided for the mean functional in Section 4.1. Again the monitoring scheme based on outperforms the procedures based on and

, and the superiority is larger for a later change. We omit a detailed discussion and summarize that the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring scheme based on the statistic

.

## 5 Closed-end scenarios

It is worthwhile to mention that the theory developed so far also covers the case of closed-end scenarios [sometimes also called finite time horizon in the literature]. In this section, we will very briefly discuss this situation and present a small batch of simulation results, which also indicate the superiority of the statistic for closed-end scenarios. Note that the null hypothesis in this setup is given by

 H0 :θ1=⋯=θm=θm+1=θm+2=…=θTm , (5.1)

which is tested against the alternative that the parameters changes (once) at some time , that is

 H1 :∃k⋆∈N:θ1=⋯=θm+k⋆−1≠θm+k⋆=θm+k⋆+1=…=θTm . (5.2)

Here the factor controls the length of the monitoring period compared to the size of the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding statement of Theorem 2.6 and Corollary 2.8.

###### Theorem 5.1

Assume that the null hypothesis (5.1) and Assumptions 2.2 - 2.4 hold. If further is a consistent and non-singular estimator of the long-run variance matrix it holds that

 (5.3)

where is a -dimensional Brownian motion with independent components.

The proof of Theorem 5.1 follows by a straightforward adaption of the proofs of Theorem 2.6 and Corollary 2.8 given in Appndix A. The corresponding results for the tests based on statistics and defined in (4.1) read as follows

 Tmmaxk=1^Qm(k)wγ(k/m)D⟹max0

and

 Tmmaxk=1^Pm(k)wγ(k/m)D⟹max0