# Convergence of U-Processes in Hölder Spaces with Application to Robust Detection of a Changed Segment

To detect a changed segment (so called epedimic changes) in a time series, variants of the CUSUM statistic are frequently used. However, they are sensitive to outliers in the data and do not perform well for heavy tailed data, especially when short segments get a high weight in the test statistic. We will present a robust test statistic for epidemic changes based on the Wilcoxon statstic. To study their asymptotic behavior, we prove functional limit theorems for U-processes in Hölder spaces. We also study the finite sample behavior via simulations and apply the statistics to a real data example.

## Authors

• 1 publication
• 6 publications
05/28/2021

### Epidemic change-point detection in general causal time series

We consider an epidemic change-point detection in a large class of causa...
08/20/2021

### Detecting changes in the trend function of heteroscedastic time series

We propose a new asymptotic test to assess the stationarity of a time se...
08/05/2020

### Scalable Multiple Changepoint Detection for Functional Data Sequences

We propose the Multiple Changepoint Isolation (MCI) method for detecting...
02/24/2020

### An Asymptotic Test for Constancy of the Variance under Short-Range Dependence

We present a novel approach to test for heteroscedasticity of a non-stat...
08/19/2020

### Epidemic changepoint detection in the presence of nuisance changes

Many time series problems feature epidemic changes - segments where a pa...
06/13/2020

### Convergence of the empirical two-sample U-statistics with β-mixing data

We consider the empirical two-sample U-statistic with strictly β-mixing ...
09/18/2020

### An Independence Test Based on Recurrence Rates. An empirical study and applications to real data

In this paper we propose several variants to perform the independence te...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In change point detection, the hypothesis is typically stationarity, but there are different types of alternative, like the at most one change point or multiple change points. In this article, we are interested in testing stationarity with respect to the so called epidemic change or changed segment alternative: We have a random sample (with values in a sample space and distributions

) and we wish to test the null hypothesis

 H0:PX1=PX2=⋯=PXn,

versus the alternative

 H1: tere is a segment  I∗:={k∗+1,…,m∗}⊂In:={1,2,…,n}  such that PXi={Pfor  i∈In∖I∗Qfor  i∈I∗,  and  P≠Q.

Under the sample constitutes a changed segment starting at and having the length and is then the corresponding distribution of changed segment. This type of alternative is of special relevance in epidemiology and has first been studied by Levin and Kline [15] in the case of a change in mean. Their test statistic is a generalization of the CUSUM (cumulated sum) statistic. Simultaneously, epidemic-type models were introduced by Commenges, Seal and Pinatel [2] in connection with experimental neurophysiology.

If the changed segment is rather short compared to the sample size, tests who give higher weight to short segments have more power. Asymptotic critical values for such tests have been proved by Sigmund [23] in Gaussian case (see also [22]). The logarithmic case was treated in Kabluchko and Wang [14], whereas the regular varying case in Mikosch and Račkauskas [16]. Yao [26] and Hušková [12] compared tests with different wheightings. Račkauskas and Suquet [20], [21] have suggested to use a compromise weighting, that allows to express the limit distribution of the test statistic as a function of a Brownian motion. However, in order to apply the continuous mapping theorem for this statistic, it is necessary to establish the weak convergence of the partial sum process to a Brownian motion with respect to the Hölder norm.

It is well known that the CUSUM statistic is sensitive to outliers in the data, see e.g. Prášková and Chochola [19]. The problem becomes worse if higher weights are given to shorter segments. A common strategy to obtain a robust change point test is to adapt robust two-sample test like the Wilcoxon one. This was first used by Darkhovsky [4] and by Pettitt [18] in the context of detecting at most one change in a sequence of independent observations. For a comparison of different change point test see Wolfe and Schechtmann [25]. The results on the Wilcoxon type change point statistic were generalized to long range dependent time series by Dehling, Rooch, Taqqu [6]. The Wilcoxon statistic can either be expressed as a rank statistic or as a (two-sample) -statistic. This motivated Csörgő and Horváth [3] to study more general -statistics for change point detection, followed by Ferger [9] and Gombay [11]. Orasch [17] and Döring [8] have studied -statistics for detecting multiple change-points in a sequence of independent observations. Results for change point tests based on general two-sample -statistics for short range dependent time series were given by Dehling, Fried, Garcia, Wendler [5], for long range dependent time series by Dehling, Rooch, Wendler [7].

Gombay [10] has suggested to use a Wilcoxon type test also for the epidemic change problem. The aim of this paper is to generalize these results in three aspects: to study more general

-statistics, to allow the random variable to exhibit some form of short range dependence, and to introduce weightings to the statistic. This way, we obtain a robust test which still has good power for detecting short changed segments. To obtain asymptotic critical values, we will prove a functional central limit theorem for

-processes in Hölder spaces.

The article is organized as follows. Section 2 introduces -statistics type test statistics to deal with epidemic change point problem. In Section 3 some experimental results are presented and discussed whereas Section 4 deals with concrete data set. Section 5 and Section 6 constitute the theoretical part of the paper where asymptotic results are established under the null hypothesis. Consistency under alternative of changed segment is discussed in Section 7. Finally in Section 8, we present the table with asymptotic critical values for tests under consideration.

## 2 Tests for changed segment based on U-statistics

A general approach for constructing procedures to detect changed segment is to use a measure of heterogeneity between two segments

 {Xi,i∈I(k,m)}  and  {Xi,i∈Ic(k,m)},  0≤k

where and . As neither the beginning nor the end of changed segment is known, the statistics

 Tn:=max0≤k

may be used to test the presence of a changed segment in the sample , where is a factor smoothing over the influence of either too short or too large data windows. In this paper we consider a class of -statistics type measures of heterogeneity defined via a measurable function by

 Δn(k,m)=Δh,n(k,m):=∑i∈I(k,m)∑j∈In∖I(k,m)h(Xi,Xj),

and the corresponding test statistics

 Tn(γ,h)=max0≤k

where and

 ργ(t)=[t(1−t)]γ, 0

Although other weighting functions are possible our choice is limited by application of functional central limit theorem in Hölder spaces.

Recall the kernel is symmetric if and antisymmetric if for all . Any non symmetric kernel can be antisymmetrized by considering

 ˜h(x,y)=h(x,y)−h(y,x),x,y∈S.

Let’s note that the kernel is antisymmetric if and only if for any independent random variables with the same distribution such that the expectation exists. The if part follows by Fubini and antisymmetry. To see the only if part, first consider the one point distribution and almost surely to conclude that for all . Next, consider the two point distribution and conclude that and thus . So a -statistic with antisymmetric kernel have expectation if the observations are independent and identically distributed and are good candidates for change point tests. We only consider antisymmetric kernels in this paper.

In the case of real valued sample, examples of antisymmetric kernels include the CUSUM kernel or more generally

for an odd function

and the Wilcoxon kernel . The kernel leads to a Wilcoxon type statistics

 Tn(γ,hW):=max0≤k

whereas with the kernel we get a CUSUM type statistics

 n−1Tn(γ,hC)=max0≤k

where . As more general classes of kernels and corresponding statistics we can consider the CUSUM test of transformed data (

) or a test based on two-sample M-estimators (

for some monotone function, see Dehling et al. [7]).

Based on invariacne principles in Hölder spaces discussed in the next section, we derive the limit distribution of test statistics . Theorems 1 and Theorem 2 provide examples of our results. Let be a standard Wiener process and be a corresponding Brownian bridge. Define for ,

 Tγ:=sup0≤s
###### Theorem 1.

If are independent and identically distributed random elements and is an antisymmetric kernel with for some , then for any , we have

 limn→∞P(n−3/2σ−1hTn(γ,h)≤x)=P(Tγ≤x),  for all  x∈R,

where the variance parameter

is defined by and .

Note that in practice, the random variables

might not have high moments, but if we use a bounded kernel like

, we know that the condition of the theorem holds for any , so we have the convergence for any . Also, in practical applications, the variance parameter has to be estimated. This can be done by

 ^σ2n,h:=1nn∑i=1^h21(Xi) (2)

with .

For the case of dependent sample, we consider absolute regular sequences of random elements (also called -mixing). Recall the coefficients of absolute regularity is defined by

 βm=EsupA∈F∞m(P(A|F0−∞)−P(A)),

where is the sigma-field generated by .

###### Theorem 2.

If is a stationary, absolutely regular sequence and is an antisymmetric kernel assume the following conditions to be satisfied:

• for some ;

• and for some .

Then for any , we have

 limn→∞P(n−3/2σ−1∞Tn(γ,h)≤x)=P(Tγ≤x),  for all  x∈R,

where the long run variance parameter is given by

 σ2∞=var(h1(X1))+2∞∑k=2cov(h1(X1),h1(Xk))

For bounded kernel the conditions (ii) on decay of the coefficients of absolute regularity reduces to

• for some .

Following Vogel and Wendler [24], can be estimated using a kernel variance estimator. For this, define autocovariance estimators by

 ^ρ(k)=1nn−k∑i=1^h1(Xi)^h1(Xi+k)

with . Then, for some Lipschitz continuous function with and finite integral, we set

 ^σ2∞=^σ2h+2n−1∑k=1K(k/bn)^ρ(k),

where is a bandwidth such that and as .

With the help of the limit distribution and the variance estimators, we obtain critical values for our test statistic. Simulated quantiles for the limit distribution can be found in Section

8.

To discus the behavior of the test statistics under the alternative we assume that for each

we have two probability measures

and on and a random sample such that for ,

 PXni={Qn,for  i∈I∗:={k∗n+1,…,k∗n+ℓ∗n}Pn,for  i∈In∖I∗.

Set

 δn=∫S∫Sh(x,y)Qn(dx)Pn(dy),  νn=∫S∫S(h(x,y)−δn)2Qn(dx)Pn(dy).
###### Theorem 3.

Let . Assume that for all , the random variables are independent and let be an antisymmetric kernel. If

 (3)

then it holds

 n−3/2Tn(γ,h)P−−−→n→∞∞. (4)

For dependent random variables, we get a similar theorem:

###### Theorem 4.

Assume that for all , the random variables are absolutely regular with mixing coefficients not depending on , such that for some . Let be an antisymmetric kernel, such that there exist such that for all , . Furthermore, let and assume that

 limn→∞√n|δn|[ℓ∗nn(1−ℓ∗nn)]1−γ=∞. (5)

Then (4) holds.

This implies that a test based on statistic is consistent. More on consistency see Section 7. The proofs of Theorems 1 and 2 are given in Section 6.

## 3 Simulation results

We compare the CUSUM type and the Wilcoxon type test statistic in a Monte Carlo simulation study. The model is an autoregressive process of order 1 with , where

are either normal distributed, exponential distributed or

distributed. We assume that the first observations are shifted, so that we observe

 Xi:={Yi/√var(Yi)+δnfor i=1,…,LYi/√var(Yi)for i=L+1,…,n

Under independence, the distribution of the change-point statistics does not dependent on the beginning of the changed segment, only on the length, so we restrict the simulation study segments of the form . In figure 1, the results for independent observations () are shown. In this case, we use the known variance of our observations and do not estimate the variance. The relative rejection frequency of 3,000 simulation runs under the alternative is plotted against the relative rejection frequency under the hypothesis for theoretical significance levels of 1%, 2.5%, 5% and 10%.

As expected, the CUSUM test has a better performance than the Wilcoxon test for normal distributed data. For the exponential and the distribution, the Wilcoxon type test has higher power. For the long changed segment (), the weighted tests with outperform the tests with . For the short changed segment (), the Wilcoxon type test has more power with weight . The same holds for the CUSUM type test under normality. For the other two distributions however, the empirical size is also higher for so that the size corrected power is not improved.

In Figure 2, we show the results for dependent observations (AR(1) with ). In this case, we estimated the long run variance with a kernel estimator, using the quartic spetral kernel and the fixed bandwithd . Both tests become to liberal now with typical rejection rates of 13% to 15% for a theoretical level of 10%. For the long changed segment () it is better to use the weight , for the short segment () the weight . Under normality, the CUSUM type test has a better performance, though the difference in power is not very large. For the other two distributions, the Wilcoxon type test has a better power. Although we have done some simulations with different locations of the changed segment we only report the results for a changed segment positioned directly at the beginning as in the case of independent observations. Let us just mention that the starting and the end points did only play a minor role to the results.

## 4 Data example

We investigate the frequency of search for the term “Harry Potter” from january 2004 until february 2019 obtained from google trends. The time series is plotted in Figure 3. We apply the CUSUM type and the Wilcoxon type changepoint test with weight parameters . The lag one autocovariance is estimated as 0.457, so that we have to allow for dependence in our testing prodecure. We estimate the the long run variance with a kernel estimator, using the quartic spetral kernel and the fixed bandwithd .

The CUSUM type test does not reject the hypothesis of stationarity for an significance level of 5%, regardsless of the choice of . In contrast, the Wilcoxon type test detects a changed segment for any , even at a significance level of 1%. The beginning and end of the changed segment are estimated differently for different values of : The unweighted Wilcoxon type test with leads to a segment from january 2008 to june 2016. For , we obtain january 2012 to june 2016 as an estimate. leads to an estimated changed segment from january 2012 to may 2016.

By visual inspection of the time series, we come to the conclusion that the estimated changed segment for values fit to data better, because this segment coincides with a period with only low frequencies of search. Furthermore, the spikes of this time series can be explained by the release of movies, and the estimated changed segment is between the release of the last harry potter movie in july 2011 and the release of “Fantastic Beasts and Where to Find Them” in november 2016.

## 5 Double partial sum process

Throughout this section we assume that the sequence is stationary and is the distribution of each . Consider for a kernel the double partial sums

 Uh,0=Uh,n=0,  Uh,k=k∑i=1n∑j=k+1h(Xi,Xj),  1≤k

and the corresponding polygonal line process defined by

 Uh,n(t):=Uh,⌊nt⌋+(nt−[nt])(Uh,⌊nt⌋+1−Uh,⌊nt⌋),  t∈[0,1], (6)

where for a real number , , , is a value of the floor function. So , is a random polygonal line with vertexes , . As a functional framework for the process we consider Banach spaces of Hölder functions. Recall the space of continuous functions on is endowed with the norm

 ||x||=max0≤t≤1|x(t)|.

The Hölder space , of functions such that

 ωγ(x,δ):=sup0<|s−t|≤δ|x(t)−x(s)||t−s|γ→0  as  δ→0,

is endowed with the norm

 ||x||γ:=|x(0)|+ωγ(x,1).

Both and are separable Banach spaces. The space is isomorphic to .

###### Definition 5.

For a kernel and a number we say that satisfies -FCLT if there is a Gaussian process , such that

 n−3/2Uh,nD−−−→n→∞Uh  in the space  Hoγ[0,1].

In order to make use of results for partial sum processes, we decompose the -statistics into a linear part and a so-called degenerate part. Hoeffding’s decomposition of the kernel reads

 h(x,y)=h1(x)−h1(y)+g(x,y),  x,y∈S,

where

 h1(x)=∫Sh(x,y)PX(dy),  and  g(x,y)=h(x,y)−h1(x)+h1(y),  x,y∈S,

 Uh,n(t)=n[Wh1,n(t)−tWh1,n(1)]+Ug,n(t),  t∈[0,1], (7)

where

 Wh1,n(t)=⌊nt⌋∑i=1h1(Xi)+(nt−⌊nt⌋)h1(X⌊nt⌋+1),  t∈[0,1],

is the polygonal line process defined by partial sums of random variables . Decomposition (7) reduces -FCLT to Hölderian invariance principle for random variables via the following lemma.

###### Lemma 6.

If there exists a constant such that for any integers

 E(Ug,m−Ug,k)2≤C(m−k)(n−(m−k)) (8)

then

 ||n−3/2Ug,n||γ=oP(1)

for any .

###### Remark 7.

For an antisymmetric kernel the condition (8) follows from the following one: there exists a constant such that for any ,

 E(n1∑i=m1+1n2∑j=m2+1g(Xi,Xj))2≤C(n1−m1)(n2−m2). (9)

Indeed, by antisymmetry

 Ug,m−Ug,k=m∑i=k+1n∑j=m+1h(Xi,Xj)+m∑i=k+1k∑j=1h(Xi,Xj),

so that (9) yields

 E(Ug,m−Ug,k)2≤2C[(m−k)(n−m)+(m−k)(k−1)]≤2C(m−k)(n−(m−k)).

Before we proceed with the proofs of Lemma 6 we need some preparation. Let be the set of dyadic numbers of level in , that is and for , . For set , , . For and define

 λr(f):={f(r+)+f(r−)−2f(r)if j≥1,f(r)if j=0.

The following sequential norm on defined by

 2−1||f||seqγ:=supj≥02γjmaxr∈Dj|λr(f)|,

is equivalent to the norm , see [1]: there is a positive constant such that

 ||f||seqγ≤||f||γ≤cγ||f||seqγ,   f∈Hoγ[0,1]. (10)

Set . In what follows, we denote by the logarithm with basis ().

###### Lemma 8.

For any there is a constant such that, if is a polygonal line function with vertexes , then

 ||Vn||γ≤cγmax0≤j≤logn2γjmaxr∈Dj∣∣Vn(⌊nr+n2−j⌋/n)−Vn(⌊nr⌋/n)∣∣.
###### Proof.

First we remark that for any ,

 maxr∈Dj|λr(Vn)|≤maxr∈Dj|Vn(r+)−Vn(r)|+maxr∈Dj|Vn(r)−Vn(r−)|.

As and belong to , this gives,

 supj≥12γjmaxr∈Dj|λr(Vn)|≤2supj≥12γjmaxr∈Dj|Vn(r+2−j)−Vn(r)|

and it follows by (10),

If and belongs to the same interval, say, , then, observing that the slope of in this interval is precisely , we have

 |Vn(t)−Vn(s)| =n(t−s)|Vn(k/n)−Vn((k−1)/n)|≤n(t−s)Δn,

where . If then

 |Vn(t)−Vn(s)| ≤|Vn(t)−Vn(k/n)|+|Vn(k/n)−Vn(s)|≤n(t−s)Δn.

If , and , then

 |Vn(t)−Vn(s)| ≤|Vn(t)−Vn((j−1)/n)|+|Vn(k/n)−Vn((j−1)/n)|+|Vn(k/n)−Vn(s)| ≤|Vn(k/n)−Vn((j−1)/n)|+n[(k/n−s)+(t−(j−1)/n)]Δn.

We apply these three configurations to and . If then only the first two configurations are possible and we deduce

 maxj≥logn2γjmaxr∈Dj|Vn(r+2−j)−Vn(r)|≤maxj≥logn2γjn2−jΔn=2nγΔn.

If then we apply the third configuration to obtain

 maxj

To complete the proof just observe that if and so . ∎

###### Proof of Lemma 6.

By Lemma 8 we have with some constant ,

 E||Ug,n||2γ≤Clogn∑j=022γj2jmaxr∈DjE(Ug,n(⌊nr+n2−j⌋/n)−Ug,n(⌊nr⌋/n))2.

Condition (8) gives

 E(Ug,n(m/n)−Ug,n(k/n))2≤2(m−k)(n−(m−k)).

This yields taking into account that for ,

 E||n−3/2Ug,n||2γ≤Cγn−3logn∑j=122γj2j[n2−j(n−n2−j)]≤Cγn−1+2γ.

This completes the proof due the restriction . ∎

The following lemma gives general conditions for the the tightness of the sequence in Hölder spaces.

###### Lemma 9.

Assume that the sequence is stationary and for a , there is a constant such that for any

 E∣∣m∑i=k+1h1(Xi)∣∣q≤cq(m−k)q/2. (11)

Then for any the sequence is tight in the space .

###### Proof.

Fix such that . By Arcela-Ascoli the embedding is compact, hence, it is enough to prove

 lima→∞supn≥1P(||n−1/2Wh1,n||β>a)=0. (12)

By Lemma 8,

 P(||n−1/2Wh1,n||β>a)≤In(a),

where

 In(a)=P(max0≤j≤logn2βjmaxr∈Dj∣∣Wh1,n(⌊nr+n2−j⌋/n)−Wh1,n(⌊nr⌋/n)∣∣≥cβn1/2a).

with some constant