# Estimation and Inference of Time-Varying Auto-Covariance under Complex Trend: A Difference-based Approach

We propose a difference-based nonparametric methodology for the estimation and inference of the time-varying auto-covariance functions of a locally stationary time series when it is contaminated by a complex trend with both abrupt and smooth changes. Simultaneous confidence bands (SCB) with asymptotically correct coverage probabilities are constructed for the auto-covariance functions under complex trend. A simulation-assisted bootstrapping method is proposed for the practical construction of the SCB. Detailed simulation and a real data example round out our presentation.

## Authors

• 7 publications
• 6 publications
• 31 publications
08/17/2021

### Modelling Time-Varying First and Second-Order Structure of Time Series via Wavelets and Differencing

Most time series observed in practice exhibit time-varying trend (first-...
11/26/2020

### Simultaneous inference for time-varying models

A general class of time-varying regression models is considered in this ...
03/13/2019

### Simultaneous Confidence Band for Stationary Covariance Function of Dense Functional Data

Inference via simultaneous confidence band is studied for stationary cov...
07/08/2020

### Robust Concordance Rate for A Four-Quadrant Plot

Before new clinical measurement methods are implemented in clinical prac...
10/07/2012

### Locally adaptive factor processes for multivariate time series

In modeling multivariate time series, it is important to allow time-vary...
12/16/2021

### Simultaneous Sieve Inference for Time-Inhomogeneous Nonlinear Time Series Regression

In this paper, we consider the time-inhomogeneous nonlinear time series ...
03/15/2018

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Our discussion begins with a heteroscedastic nonparametric regression model

 Yi=μ(ti)+√V(ti)εi,i=1,…,n, (1)

where are the observations, is an unknown mean function, are the design points,

are the errors with mean zero and variance

and is the variance function. Historically, it has been assumed that the errors are independent. Variance estimation in regression models with the unknown mean has traditionally been a rather important problem. Accurate variance estimation is required for the purpose of, for example, construction of confidence bands for the mean function, testing the goodness of fit of a model, and also in order to choose the amount of smoothing needed to estimate the mean function; see e.g. [29], [13], [15], and [18]. An extensive survey of the difference sequence approach to estimate the variance in the nonparametric regression setting when the variance function is only a constant can be found in [7].

The situation when the variance is not constant is more complicated. One of the first attempts to estimate the variance function in a regression model was made in [20] who proposed the basic idea of kernel smoothing of squared differences of observations. This idea has been further developed in [22]. [2] introduced a class of difference-based local polynomial regression-based estimators of the variance function and obtained optimal convergence rates for this class of estimators that are uniform over broad functional classes. [31] obtained the minimax rate of convergence for estimators of the variance function in the model (1) and characterized the effect of not knowing the mean function on the estimation of variance function in detail. Similar approach was used to construct a class of difference-based estimators in [3] when the covariate for

All of the above mentioned papers only considered the case where the data are independent. However, difference-based methods have also been used to estimate variance and/or autocovariance in nonparametric regression where the errors are generated by a stationary process. The pioneering approach here was probably that of [21]

who proposed estimators based on the first-order differences to estimate (invertible) linear transformations of the variance-covariance matrix of stationary

-dependent errors. Here, by -dependent errors we mean the errors generated by a stationary process whose autocovariance is equal to zero for any lag greater than some [19] suggested second order differences to estimate the zero frequency of the spectral density of stationary processes with short-range dependence. In the case of autoregressive errors, [16] proposed -root consistent and, under the assumption of normality of errors, efficient estimators of the autocovariance that are also based on differences of observations. Under certain mixing conditions, [26] proposed estimating the autocovariance function by applying difference-based estimators of the first order to the residuals of a kernel-based fit of the signal. [34] provided an optimal difference-based estimator of the variance for smooth nonparametric regression when the errors are correlated. Finally, the closest to us in spirit is, probably, [30] that proposed a class of difference-based estimators for the auto-covariance in nonparametric regression when the signal is discontinuous and the errors form a stationary -dependent sequence. To the best of our knowledge, the problem of auto-covariance estimation in a nonparametric regression where the errors form a non-stationary sequence while the signal is discontinuous has not been considered before.

The purpose of this article is to estimate and make inference of the time-varying covariance structure of a locally stationary time series when it is contaminated by a complex trend function with both smooth and abrupt changes. Here local stationarity refers to the slowly or smoothly evolving data generating mechanism of a temporal system ([6], [24], [37]). In time series analysis, the estimation and modelling of the auto-covariance structure is of fundamental importance in, for example, the optimal forecasting of the series ([1]), the efficient estimation of time series regression models ([17]) and the inference of time series regression parameters ([1]

). When the trend function is discontinuous, removing the trend from the time series and then estimating the auto-covariances from the residuals is not a good idea since it is very difficult to estimate the trend function near the points of discontinuity accurately. In this case, the aforementioned difference-based methods offer a good alternative. In this paper, we adopt a difference-based local linear regression method for the aforementioned time-varying auto-covariance estimation problem. The method can be viewed as a nonparametric and non-stationary extension to

[30]. It is shown that the uniform convergence rate of auto-covariance function estimation for the difference-based method under complex trend is the same as that of auto-covariance function estimation of a zero-mean time series when the number of points of discontinuity as well as the jump sizes diverge to infinity at a sufficiently slow rate. Therefore, asymptotically, the accuracy of auto-covariance function estimation will not be affected by the complex trend when the difference-based nonparametric method is used.

Making inference of the auto-covariance functions is an important task in practice as practitioners and researchers frequently test whether certain parametric or semi-parametric models are adequate to characterize the time series covariance structure. For instance, one may be interested in testing whether the auto-covariance functions are constant over time so that a weakly stationary time series model is sufficient to forecast the future observations. There is a rich statistical literature on the inference of auto-covariance structure of locally stationary time series, particularly on the testing of weak stationarity of such series. See for instance

[25], [12], [8], [23], [9] and [11]

. To our knowledge, only constant or smoothly time-varying trend were considered in the aforementioned literature of covariance inference. In this paper, simultaneous confidence bands (SCB) with asymptotically correct coverage probabilities are constructed for the time-varying auto-covariance functions when estimated by the difference-based local linear method. The SCB serves as an asymptotically correct tool for various hypothesis testing problems of the auto-covariance structure under discontinuous mean functions. A general way to perform such hypothesis tests is to estimate the auto-covariance functions under the parametric or semi-parametric null hypothesis and then check whether the fitted functions can be fully embedded into the SCB. As the auto-covariance functions can be estimated with faster convergence rates under the parametric or semi-parametric null hypothesis, the aforementioned way to perform the test achieves correct Type-I error rate asymptotically. The tests are of asymptotic power 1 for local alternatives whose uniform distances from the null are of the order greater than that of the width of the SCB, see Theorem 2 in

[35] for instance. We also propose a simulation-assisted bootstrapping method for the practical construction of the SCB.

The paper is organized as follows. In Section 2, we introduce the model formulation and some assumptions on and . Section 3 presents the asymptotic theory for local estimate . Practical implementation including a suitable difference lag and tuning parameters selection procedure, estimation of covariance matrices as well as an assisted bootstrapping method are discussed in Section 4. In Section 5, we conduct some simulation experiments on the performance of our SCBs. A real data application is provided in Section 6. The proofs of the main results are deferred to the Appendix.

## 2 Model formulation

Consider model:

 yi,n=μi,n+xi,n (2)

where is a mean function or signal with unknown change points, and is a zero-mean locally stationary process with . Eq. 2 covers a wide range of nonstationary linear and nonlinear processes, see [37] for more discussion. We shall omit the subscript in the sequel if no confusion arises. Let

, be independent identically distributed (i.i.d.) random variables, and define

. Then, the process can be written as

 xi=G(ti,Fi),

where is a measurable function such that is well defined for all . In this paper, we focus on the case that there exists such that

 μ(t)=d∑j=0μj(t)1{aj≤t

where is a Lipschitz continuous function over and is the total number of change points. Till the end of this paper, we will always assume and the maximal jump size with .

To estimate the second order structure of the process Eq. 2, we introduce the approach based on the difference sequence of a finite order applied to the observations . Assuming that the number of observations is , this difference-based covariance estimation approach would define simple squared differences of the observations, i.e., . Notice that for any fixed , is a stationary process. For convenience, let us denote Then, is the th order autocovariance function of the process at the fixed time in other words, . If , then turns out to be the variance of .

We first introduce some notation that will be used throughout this paper. For any vector

, we let . For any random vector , write if . Denote as the function space on [0,1] of functions that have continuous first derivatives with integer . Now, we need the following definition and assumptions:

###### Definition 1 (Physical dependence measure).

Let be an i.i.d. copy of . Then, for any , we denote The physical dependence measure for a stochastic system is defined as

 δq(L,j)=supt∈[0,1]∥L(t,Fj)−L(t,F′j)∥q. (3)

If , let . Thus, measures the dependence of the output on the single input ; see [32] for more details.

for .

###### Assumption 2 (Stochastic Lipschitz continuity).

There exists a constant , such that holds for all and .

creftype 1 shows that the dependence measure of time series decays at a polynomial rate, thus indicating short-range dependence. creftype 2 means that changes smoothly over time and ensures local stationarity. Here, we show some examples of the locally stationary linear and nonlinear time series that satisfy these assumptions.

###### Example 1 (Nonstationary linear processes).

Let be i.i.d. random variables with ; let be functions such that

 G(t,Fi)=∞∑j=0aj(t)ζi−j (4)

is well defined for all . Clearly by [37, Proposition 2], we know that creftype 1 will be satisfied if . Furthermore, if , the stochastic Lipschitz continuity condition in creftype 2 also holds true.

###### Example 2 (Nonstationary nonlinear processes).

Let be i.i.d. random variables and consider the nonlinear time series framework

 ξi(t)=R(t,ξi−1(t),ζi), (5)

where is a measurable function and . This form has been introduced by [37] and [36]. Suppose that for some , we have for . Denote

 χ:=supt∈[0,1]L(t), where L(t)=supx≠y∥R(t,x,ζ0)−R(t,y,ζ0)∥q|x−y|.

It is known from [37, Theorem 6] that if , then Eq. 5 admits a unique locally stationary solution with and the physical dependence measure satisfies that

, which shows geometric moment contraction. Hence, the temporal dependence with exponentially decay indicates

creftype 1 holds with . Further by [37, Proposition 4], we conclude that creftype 2 holds for if

 supt∈[0,1]∥M(G(t,F0))∥q<∞, where M(x)=sup0≤t

Due to the local stationarity of the process , we have the following lemma which shows that, under mild assumptions, the auto-covariance of also exhibits polynomial decay.

###### Lemma 1.

Suppose Assumptions 1 and 2 hold, then we have for .

With the above result, we can choose large enough such that for . Next we focus on the difference series for and we always assume . By Eq. 2, we know that

 ρki =(xi−xi−k)2+(μi−μi−k)2+2(xi−xi−k)(μi−μi−k) : =αki+λki+θki. (6)

Recall for and notice that is the squared difference of two locally stationary processes. Therefore, it is also a locally stationary process. As a result, we can define

 αki=(xi−xi−k)2=βk(si)+εki, k=1,...,h, (7)

where is the unknown trend function and is a zero-mean process. Then can be written as

 εki=Hk(si,Fi), (8)

where is a measurable function similar to . With Eq. 7, if the trend function is smooth, one can easily obtain the estimator of . Now, we introduce the following conditions.

###### Assumption 3.

For each , we assume that the th order autocovariance function .

###### Assumption 4.

The smallest eigenvalue of

is bounded away from 0 on for , where

 σk(t)={∞∑j=−∞Cov(Hk(t,F0),Hk(t,Fj))}1/2, (9)

and represents the long-run variance of for each fixed .

###### Assumption 5.

A kernel is a symmetric proper density function with the compact support .

creftype 3 guarantees that the trend function changes smoothly for each and is three-times continuously differentiable over . creftype 4 prevents the asymptotic multicollinearity of regressors. creftype 5 allows popular kernel functions such as Epanechnikov kernel. Now substituting Eq. 7 to Section 2, we have

 ρki=βk(si)+εki+λki+θki. (10)

Since the length of the series is , we reset the subscript with respect to as and therefore the time point turns out to be for . Similar notations are used for series and . By creftype 3 and the Taylor’s expansion on , it is natural to estimate using the local linear estimator as follows:

 (ˆβk,b(t),ˆβ′k,b(t))=argminc0,c1∈R[n∑i=1[ρki−c0−c1(ti−t)]2Kb(ti−t)], (11)

where with and is a kernel function, is the bandwidth satisfying and . Since Eq. 11 is essentially a weighted least squares estimate, we can write the solution of Eq. 11 as

 ˆβk,b(t)=n∑i=1ωbn(t,i)ρki, (12)

where with , . The time domain of is fixed over and is the weight given to each observation.

Next, we will establish the following two lemmas that are useful in establishing asymptotic properties of proposed estimators. Their proofs are given in the Appendix.

###### Lemma 2.

Suppose Assumptions 1-2 hold, then we have for and for , where .

###### Lemma 3.

Suppose Assumptions 1-3 hold, then we have and .

## 3 Main Results

### 3.1 Asymptotic theory

By creftype 3 and for , define

 Qkn,l(t) =1nbn∑i=1(ti−tb)lK(ti−tb), (13) Rkn,l(t) =1nbn∑i=1ρki(ti−tb)lK(ti−tb). (14)

Then Eq. 11 can be expressed as

 ⎛⎝ˆβk,b(t)bˆβ′k,b(t)⎞⎠=(Qkn,0(t)Qkn,1(t)Qkn,1(t)Qkn,2(t))−1(Rkn,0(t)Rkn,1(t)):=[Qkn(t)]−1Rkn(t). (15)

Let

 μl=∫RxlK(x)dx  and  ϕl=∫RxlK2(x)dx, l=0,1,....

Now, we will construct SCBs for .

###### Theorem 1.

Suppose that Assumptions 1-5 hold and further assume that
(1) is Lipschitz continuous on [0,1].
(2) .
(3) .
Then, for each , we have

 P[√nbϕ0supt∈T∣∣σ−1k(t){ˆβk,b(t)−βk(t)}∣∣−BK(m∗)≤u√2log(m∗)] =exp{−2exp(−u)},

as , where and

 BK(m∗)=√2log(m∗)+1√2log(m∗)log(1π√14ϕ0∫1−1|K′(u)|2du).

Let us comment on the conditions listed in Theorem 1. Condition (1) shows the smoothness of . Condition (2) indicates that the change-point number and size can both go to infinity but at a slow rate. The assumption in Condition (3) is an undersmoothing requirement that reduces the bias of the estimators to the second order.

Notice that , where when there is no change point between observations and , when there exists at least a change point on . However, the estimate of can be viewed as a negligible term (see Eq. 21 in the proof of Theorem 2). With the previous discussion in mind, we can define

 ˆγ0(t)=12ˆβh,bh(t), ˆγk(t)=12[ˆβh,bk(t)−ˆβk,bk(t)], k=1,...,h−1,

where and are the bandwidths for estimators and , respectively. Making it easy to distinguish, here we use the different notations for the bandwidths which will be selected by some criterion (see Section 4.4). Notice that we require the same bandwidth to compute the estimator of . With the above results, the SCB for is straightforward.

###### Corollary 1.

With the conditions in Theorem 1, we have

 =exp{−2exp(−u)}.

Furthermore, to facilitate the SCB for , we will consider a linear combination of . First, define and a by matrix

 Σ2k(t)=∞∑j=−∞Cov(˜Hk(t,F0)˜Hk(t,Fj)). (16)

We also denote as a two-dimensional vector, and The natural estimators for and are and , respectively. Furthermore, let similar to Theorem 3 in [38]. At this point, we can obtain the following result.

###### Corollary 2.

Suppose that the smallest eigenvalue of is bounded away from 0 on for Moreover, we assume that all of the conditions of Theorem 1 are valid. Then, we have (i)

 P[√nbkϕ0supt∈T∣∣σ−1C,k(t){ˆβC,k(t)−βC,k(t)}∣∣−BK(m∗)≤u√2log(m∗)] =exp{−2exp(−u)},

as . (ii) Furthermore, one can easily deduce the SCB for , ,

 =exp{−2exp(−u)}.
###### Remark 1.

It is noteworthy to mention that for estimating , we use the same bandwidth ; therefore, the entire estimator depends on only a single tuning parameter (bandwidth ). This enables us to achieve the conclusion of Corollary 2(i) based on the result of Theorem 1. As a result, Corollary 2(ii) also holds true due to this fact.

After constructing SCBs for the second-order structure , the following theorem states that are consistent estimators for uniformly in for all .

###### Theorem 2.

Under Assumptions 1-5 and suppose conditions

 α+2β≤25,   log(n)n3/5−βb+nb5log(n)→0

hold true. Then, we have

 supt∈T∣∣ˆβk,b(t)−βk(t)∣∣=OP(χn),  k=1,...,h,

where .

This theorem implies the uniform consistency of Additionally, due to the relationship between and we can also easily obtain the following consistency result for

###### Corollary 3.

With the conditions in Theorem 2, we have

 supt∈T|ˆγk(t)−γk(t)|=OP(χn),  k=0,...,h−1.

## 4 Practical implementation

### 4.1 Selection of the difference lag

Note that for any fixed time , and recall that when , where is a large value that has been chosen in advance. Hence, we know that if , is practically invariant with respect to as increases. This fact suggests the following bandwidth selection procedure.

First, for any fixed , we choose a large enough value and select Next, we calculate . Then, by successively decreasing the value of and considering we calculate the corresponding quantities until shows an abrupt change. At this point, the optimal difference lag for time can be selected as the current plus Intuitively, we can interpret this through the scatterplot of . When the slope of the function shows an obvious change, then we can choose . Following the above procedure for each time point , we finally choose the optimal lag as .

### 4.2 Covariance matrix estimation

To apply Corollaries 1 and 2 (ii), we need to estimate the long-run variance in Eq. 16 first. This problem is complicated but has been extensively studied by many researchers. Here we adopt the technique considered by [38].

Let , where for . Notice that and denote . In the locally stationary case, we can make use of the fact that a block of is approximately stationary when its length is small compared with . Hence, as and . Let be the bandwidth and define the covariance matrix estimator as

 ˜Σ2k(t)=n∑i=1~ωτn(t,i)Nki,  ~ωτn(t,i)=Kτ(ti−t)∑nk=1Kτ(tk−t),

with being the bandwidth. Therefore, the estimate is guaranteed to be positive semidefinite. The following theorems provide consistency of our covariance matrix estimate.

###### Theorem 3.

Assume that and . Then, for each and any fixed ,

 ∥∥˜Σ2k(t)−Σ2k(t)∥∥=O(√mnτ+1m+τ2),

for ,

 ∥∥∥supt∈I∣∣˜Σ2k(t)−Σ2k(t)∣∣∥∥∥=O(√mnτ2+1m+τ2).

In practice, the errors cannot be observed, thus we use , where is defined as with therein replaced by its estimator .

###### Theorem 4.

Assume that conditions of Theorem 2 and conditions of Theorem 3 hold. Denote , where is defined in Theorem 2 and further assume . Then

 supt∈I∣∣ˆΣ2k(t)−Σ2k(t)∣∣=OP(νn+√mnτ2+1m+τ2).

Note that is the first diagonal element of and . Thus, the covariance estimates in Corollaries 1 and 2 (ii) can be easily calculated via plugging in the long-run covariance matrix estimate .

### 4.3 Simulation assisted bootstrapping method

Now we aim to apply Corollary 1 and Corollary 2 (ii) to construct the SCBs. Let and be uniformly consistent estimators of and for , respectively. Then the corresponding th SCB with for and are

 ⎡⎣ˆγ0(t)±^σh(t)√ϕ04nb(BK(m∗)−log[log(1−α)−1/2]√2log(m∗))⎤⎦, ⎡⎣ˆγk(t)±^σC,k(t)√ϕ04nb(BK(m∗)−log[log(1−α)−1/2]√2log(m∗))⎤⎦, k=1,...,h−1.

Due to the slow rate of convergence to Gumbel distribution, in practice, the UCB from Corollaries 1 and 2 (ii) may not have good finite-sample performances. To circumvent this problem, we shall adopt a simulation assisted bootstrapping approach.

###### Proposition 1.

Suppose conditions in Theorem 1 hold and also assume that is Lipschitz continuous for

. Then, on a richer probability space, there are i.i.d. standard normal distributed random variables

such that

 supt∈T|ˆγ0(t)−γ0(t)−Z0(t)|=OP(ψn), supt∈T|ˆγk(t)