# Epidemic change-point detection in general causal time series

We consider an epidemic change-point detection in a large class of causal time series models, including among other processes, AR(∞), ARCH(∞), TARCH(∞), ARMA-GARCH. A test statistic based on the Gaussian quasi-maximum likelihood estimator of the parameter is proposed. It is shown that, under the null hypothesis of no change, the test statistic converges to a distribution obtained from a difference of two Brownian bridge and diverges to infinity under the epidemic alternative. Numerical results for simulation and real data example are provided.

## Authors

• 8 publications
• 12 publications
03/24/2021

### Epidemic change-point detection in general integer-valued time series

In this paper, we consider the structural change in a class of discrete ...
08/27/2019

### Convergence of U-Processes in Hölder Spaces with Application to Robust Detection of a Changed Segment

To detect a changed segment (so called epedimic changes) in a time serie...
07/27/2020

### Poisson QMLE for change-point detection in general integer-valued time series models

We consider together the retrospective and the sequential change-point d...
12/06/2021

### Cadence: A Practical Time-series Partitioning Algorithm for Unlabeled IoT Sensor Streams

Timeseries partitioning is an essential step in most machine-learning dr...
10/10/2020

### Rough-Fuzzy CPD: A Gradual Change Point Detection Algorithm

Changepoint detection is the problem of finding abrupt or gradual change...
10/28/2021

### Location-Adaptive Change-Point Testing for Time Series

We propose a location-adaptive self-normalization (SN) based test for ch...
12/31/2020

### Adaptive Quantile Computation for Brownian Bridge in Change-Point Analysis

As an example for the fast calculation of distributional parameters of G...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We consider a general class of affine causal time series models in a semiparametric setting. Let

be a sequence of centered independent and identically distributed (iid) random variables satisfying

and a compact subset of (). For and any , define
Class : A process belongs to if it satisfies:

 Xt=Mθ(Xt−1,Xt−2,…)⋅ξt+fθ(Xt−1,Xt−2,…)  ∀t∈T, (1.1)

where are two measurable functions. The existence of a stationary and ergodic solution as well as the inference for the class have been addressed by Bardet and Wintenberger (2009). Numerous classical time series such as AR(), ARCH(), TARCH() or ARMA-GARCH models belong to this class (see Bardet and Wintenberger (2009)). This class of models has now been well, see for instance Bardet et al. (2012), Kengne (2012), Bardet and Kengne (2014) for change-point detection on this class; Bardet et al. (2017) for inference based on the Laplacian quasi-likelihood; Bardet et al. (2020), Kengne (2020) for model selection in this class.

We focus here on the epidemic change-point detection in the class . Assume that a trajectory of the process is observed and consider the following test hypotheses:

1. is a trajectory of the process with .

2. : there exists (with and ) such that belongs to .

The epidemic alternative H refers to the so-called epidemic period, which runs from to .

Several works in the literature are devoted to the epidemic change-point detection in time series. We refer among others, to Levin and Kline (1985), Yao (1993), Csörgö and Horváth (1997), Ramanayake and Gupta (2003), Račkauskas and Suquet (2004), Račkauskas and Suquet (2006), Guan (2007), Jarušková and Piterbarg (2011), Aston and Kirch (2012a, 2012b), Bucchia (2014), Graiche et al. (2016). As pointed out by Diop and Kengne (2021), most of these procedures are developed for the epidemic change-point detection in the mean of random variables. The latter authors addressed this issue for a general class of integer valued time series.

In this new contribution, we propose a test based on the Gaussian quasi-likelihood for the epidemic change-point detection in the class of affine causal models . Under the null hypothesis of no change, the proposed statistic converges to a distribution obtained from a difference between two Brownian bridges; this statistic diverges to infinity under the epidemic alternative. These findings lead to a test which has correct size asymptotically and is consistent in power.

The rest of the paper is outlined as follows. Section 2 provides some assumptions and the definition of the Gaussian quasi-likelihood. Section 3 focuses on the construction of the test statistic and the asymptotic studies under the null and the epidemic alternative. Some numerical results for simulation and real data example are displayed in Section 4. Section 5 is devoted to the proofs of the main results.

## 2 Assumptions and QMLE

Throughout the sequel, we use the following notations:

• , for any ;

• , for any matrix ; where denotes the set of matrices of dimension with coefficients in ;

• for any function ;

• , where

is a random vector with finite

order moments;

• for any such as .

In the sequel, 0 denote the null vector of any vector space. For and any compact set , define

Assumption A (): Assume that and there exists a sequence of non-negative real number such that satisfying

 ∥∥∂iΨθ(x)∂θi−∂iΨθ(y)∂θi∥∥K≤∞∑k=1α(i)k(Ψθ,K)|xk−yk|for all x,y∈R∞,

where , , , are respectively replaced by , , , if .

For any , define

 Θ(r)={θ∈Rd/A0(fθ,{θ}) and A0(Mθ,{θ}) hold with ∞∑k=1{α(0)k(fθ,{θ})+∥ξ0∥rα(0)k(Mθ,{θ}}<1}⋃{θ∈Rd / fθ=0 and A0(hθ,{θ}) holds with ∥ξ0∥2r∞∑k=1α(0)k(hθ,{θ})<1}.

These Lipschitz-type conditions are notably useful when studying the existence of solutions of the class . If , then there exists a -weakly dependent stationary and ergodic solution satisfying (see Doukhan and Wintenberger (2008) and Bardet and Wintenberger (2009)).

Consider a trajectory of a process . If , then for any segment , the conditional Gaussian quasi-(log)likelihood computed on is given by,

 L(T,θ):=−12∑t∈Tqt(θ)  with  qt(θ)=(Xt−ftθ)2htθ+log(htθ) (2.1)

where , and . In the sequel, we deal with an approximated quasi-(log)likelihood contrast given for any segment by,

 ˆL(T,θ):=−12∑t∈Tˆqt(θ)whereˆqt(θ):=(Xt−ˆftθ)2ˆhtθ+log(ˆhtθ)

with , and ; and consider the estimator,

 ˆθn(T):=argmaxθ∈Θ(ˆL(T,θ)) (2.2)

The following assumptions are needed to study the asymptotic behavior of the estimator defined in (2.2). Assumption D: such that for all

Assumption Id(): For a process and for all ,

 (fθ∗(X0,X−1,⋯)=fθ(X0,X−1,⋯) and hθ∗(X0,X−1,⋯)=hθ(X0,X−1,⋯) a.s.)⇒ θ∗=θ.

Assumption Var(): For a process , one of the families or is linearly independent.

Under H and the above assumptions, Bardet and Wintenberger (2009) established the consistency and the asymptotic normality of the estimator for the class .

## 3 Test statistic and asymptotic results

Under H, recall that (see Bardet and Wintenberger (2009)), for the class , it holds that

 √n(ˆθ(T1,n)−θ∗0)cD⟶n→∞N(0,F−1GF−1), (3.1)

with

 G:=E[∂q0(θ∗0)∂θ∂q0(θ∗0)∂θ′]andF:=E[∂2q0(θ∗0)∂θ∂θ′], (3.2)

where denotes the transpose. For any segment , consider the following matrices,

 ˆG(T):=1Card(T)∑t∈T(∂ˆqt(ˆθ(T))∂θ)(∂ˆqt(ˆθ(T))∂θ)′andˆF(T):=1Card(T)∑t∈T∂2ˆqt(ˆθ(T))∂θ∂θ′. (3.3)

Under H, and are consistent estimators of and , respectively.

In the sequel, we follow the idea of Diop and Kengne (2021). Let , be two integer valued sequences such that: and . For all , define the matrix

 ˆΣ(un)=13[ˆF(T1,un)ˆG(T1,un)−1ˆF(T1,un)+ˆF(Tun+1,n−un)ˆG(Tun+1,n−un)−1ˆF(Tun+1,n−un)+ˆF(Tn−un+1,n)ˆG(Tn−un+1,n)−1ˆF(Tn−un+1,n)]

where , , are replaced by 0 if these matrices are not invertible. Also, define the set

 Tn={(k1,k2)∈([vn,n−vn]∩N)2  with  k2−k1≥vn}.

For all , set

 Cn,k1,k2=(k2−k1)n3/2[(n−(k2−k1))ˆθ(Tk1+1,k2)−k1ˆθ(T1,k1)−(n−k2)ˆθ(Tk2+1,n)], (3.4)

and consider the test statistic

 ˆQn=max(k1,k2)∈TnˆQn,k1,k2  with  ˆQn,k1,k2=C′n,k1,k2ˆΣ(un)Cn,k1,k2. (3.5)

As pointed out by Diop and Kengne (2021), this test statistic coincides with those proposed by Rackauskas and Suquet (2004) (statistic ), Jarusková and Piterbarg (2011) (statistic ), Bucchia (2014) (statistic ) or Aston and Kirch (2012) (statistic ) for the particular case of epidemic change-point detection in the mean. In this sense, the test considered here can be seen as a generalization these procedures.

The following theorem provides the asymptotic behavior of the statistic under the null hypothesis. In the condition (3.6) in this theorem, we make the convention that if A holds, then for all and if A holds, then for all .

###### Theorem 3.1

Under H with , assume that D, Id(), Var() (for the class ), A, A (or A) hold with

 α(i)k(fθ,Θ)+α(i)k(Mθ,Θ)+α(i)k(hθ,Θ)=O(k−γ) for i=0,1,2 and some% γ>3/2. (3.6)

Then,

 ˆQnD⟶n→∞sup0≤τ1<τ2≤1∥Wd(τ1)−Wd(τ2)∥2, (3.7)

where is a -dimensional Brownian bridge.

For any , denote the

-quantile of the distribution of

. Therefore, at a nominal level , the critical region of the test is ; which leads to a procedure with correct size asymptotically. Table 1 of Diop and Kengne (2021) provides the values of for and .

For asymptotic under the epidemic alternative, the following additional condition is needed.

Assumption B: There exists such that (with is the integer part).

We have the following result.

###### Theorem 3.2

Under with , assume that D, Id(), Var() (for the classes and ), A, A (or A) and (3.6) hold. Then,

 ˆQnP⟶n→∞+∞. (3.8)

This theorem shows that the proposed procedure is consistency in power. An estimator of the change-points under the epidemic alternative is given by

 ˆt–n=% argmax(k1,k2)∈TnC′n,k1,k2ˆΣ(un)Cn,k1,k2.

## 4 Some numerical results

This section presents some results of a simulation study and a real data example. For a sample size , the statistic is computed with and (see also Remark 1 in Kengne (2012)). The empirical levels and powers are obtained after 200 replications at the nominal level .

### 4.1 Simulation study

We consider the following models:

(i) ARMA(1,1) processes:

 Xt=α∗0+α∗1Xt−1+ξt+β∗1ξt−1  for all t∈Z. (4.1)

The parameter of the model is , where is a compact subset of such as: for all , . Since we can write for all ,

 Xt=α∗01+β∗1+(α∗1+β∗1)(Xt−1+∑k≥2(−β∗1)k−1Xt−k)+ξt,

the model (4.1) belongs to the class with and for all . For this model, the Lipschitz-type conditions A () as well as D are automatically satisfied. Moreover, if is a non-degenerate random variable, then the assumptions Id() and Var() hold; and for any such that , . In the sequel, we deal with an ARMA(1,1) with a non zero mean (), an ARMA(1,1) with mean zero () and an AR(1) with a non zero mean ().

We consider the change-point test with an epidemic alternative where the parameter of the model is under H, and , under H. Firstly, two trajectories of an ARMA(1,1) with mean zero are generated: a trajectory under H with and a trajectory under H with breaks at , , . Figure 1 displays the statistic . One can see that, for the scenario without change, the values of this statistic are below the horizontal triangle which represents the limit of the critical region (see Figure 1(a)). Under the epidemic alternative, is greater than the critical value of the test and is reached around the points where the changes occur (see the dotted lines in Figure 1(b)).

(ii) GARCH(1,1) processes:

 Xt=σtξt   with  σ2t=α∗0+α∗1X2t−1+β∗1σ2t−1, (4.2)

the parameter , a compact subset of such as: for all , . For all , we get

 Xt=ξt√α∗0/(1−β∗1)+α∗1∑k≥1(β∗1)k−1X2t−k.

Therefore, the model (4.2) belongs to the class with and for all . The Lipschitz-type conditions A () hold automatically and D is satisfied with . In addition, if is a non-degenerate random variable, then the assumptions Id() and Var() hold; and for any such that , . In the sequel, we consider a GARCH(1,1) () and an ARCH(1) ().

For both the ARMA and GARCH model, we carry out the change-point test with an epidemic alternative where the parameter of the model is under H, and , under H with change-points at for sample size . The empirical levels and powers are displayed in Table 1. The AR example is related to the real data application, see subsection 4.2. The results in this table show that, the empirical level approaching the nominal one when increases and the empirical power increases with and is overall close to one when . These findings are consistent with the asymptotic results of Theorems 3.1 and 3.2.

### 4.2 Real data example

We consider the daily concentrations of carbon monoxide in the Vitória metropolitan area. These daily levels are obtained from the State Environment and Water Resources Institute, where the data were collected at eight monitoring stations. There are available observations that represent the average concentrations from September 11, 2009 through December 09, 2010 (see Figure 2(a)). The data are a part of a large dataset (available at https://rss.onlinelibrary.wiley.com/pb-assets/hub-assets/rss/Datasets/RSSC%2067.2/C1239deSouza-1531120585220.zip) which were analyzed by Souza et al. (2018) to quantify the association between respiratory disease and air pollution concentrations.

To test the presence of an epidemic change in this series, we apply our detection procedure with the ARMA() model. We have applied the test with several values of and ; and the results after change-point detection show a preference (in the sense of AIC and BIC) for an AR(1). Figure 2(b) displays the values of for all . The critical value on nominal level is and the resulting test statistic is ; which implies that the null hypothesis H is rejected (i.e., an epidemic change-point is detected). The vector of the break-points estimated is ; i.e, the point where the peak in the graph is reached (see Figure 2(b)). The locations of the changes correspond to the dates January 31 and August 06, 2010. This corresponds to the period where the winds are weaker and the austral winter; these meteorological factors are noticeable to increase the concentration of the carbon monoxide. The estimated model on each regime is given by:

 Xt=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩813.39(19.72)+0.309(0.08)Xt−1+ξt  for  t≤143,933.27(22.43)+0.240(0.07)Xt−1+ξt   for  144≤t≤330,822.83(22.82)+0.293(0.09)Xt−1+ξt   for  t≥331, (4.3)

where in parentheses are the standard errors of the estimators. From (

4.3), one remark that, the parameter of the first regime is close to that of the third regime; which strengthens the hypothesis of the existence of an epidemic change-point.

## 5 Proofs of the main results

To simplify the expressions, in this section, we will use the conditional Gaussian quasi-log-likelihood up to multiplication by 1/2, given by and .

### 5.1 Proof of Theorem 3.1

Let , where and are the matrices defined in (3.2). Define the statistic

 Qn=max(k1,k2)∈TnQn,k1,k2  with  Qn,k1,k2=C′n,k1,k2ΣCn,k1,k2.

Consider the following lemma; we can go along similar lines as in the proof of Lemma 6.3 in Diop and Kengne (2021) to show the part (i). The part (ii) is established in Bardet and Wintenberger (2009).

###### Lemma 5.1

Suppose that the assumptions of Theorem 3.1 hold. Then,

1. ;

2. is a stationary ergodic, square integrable martingale difference sequence with covariance matrix .

Let two integers , and . Applying the mean value theorem to , there exists between and such that

 ∂∂θiL(Tk,k′,¯θ)=∂∂θiL(Tk,k′,θ∗0)+∂2∂θ∂θiL(Tk,k′,θn,i)(¯θ−θ∗0);

i.e,

 (k′−k+1)Fn(Tk,k′,¯θ)(¯θ−θ∗0)=∂∂θL(Tk,k′,θ∗0)−∂∂θL(Tk,k′,¯θ) (5.1)

with