1 Introduction
Let
be a stationary time series of random variables with finite mean and variance. Given consecutive observations of this time series
, we consider testing the plausibility that the data were generated from a parametric model. We consider causal models of the form
(1.1) 
where the ’s are independent and identically distributed (iid) with mean zero and finite variance, denotes the sequence , and is the parameter vector. Assume further that the model (1.1) has the invertible representation
(1.2) 
The objective of this paper is to provide a validity check of the model (1.1) by testing the estimated residuals for independence.
Given observations and , an estimator for , the innovations can be approximated by the residuals based on the infinite sequence , defined as
(1.3) 
Since we do not observe for , we instead use the estimated residuals
(1.4) 
where is the infinite sequence with , and for . If the time series is stationary and ergodic, the influence of in (1.3) becomes negligible for large and and become close.
It is general practice to inspect for goodnessoffit of the time series model. If (1.1) correctly describes the generating mechanism of , one would expect to behave similarly as . However, the sequence is not iid since they are functions of , hence certain properties of can differ from that of , which in turn may impact sample statistics such as the sample autocorrelation of the residuals. This has been noted for specific time series models in the literature. For example, for the ARMA model, corrections have been made for statistics based on the residuals, see Section 9.4 of Brockwell and Davis (1991)
. For heteroscedastic GARCH models, the moment sum process of the residuals is notedly different from that of iid innovations, see
Kulperger and Yu (2005). Though should be nearly independent under the true model assumption, the discrepancy between and should be taken into account when designing a goodnessoffit test.In this paper, we characterize the serial dependence of the residuals using distance covariance. Distance covariance is a useful dependence measure with the ability to detect both linear and nonlinear dependence. It is zero if and only if independence occurs. We study the autodistance covariance function (ADCV) of the residuals and derive its limit when the model is correctly specified. We show that the limiting distribution of the ADCV of differs from that of its iid counterpart and quantify the difference. This is an extension of Section 4 of Davis et al. (2018) which considered this problem for AR processes.
The remainder of the paper is structured as follows. An introduction to distance correlation and ADCV along with some historical remarks are given in Section 2. In Section 3, we provide the limit result for the ADCV of the residuals for a general class of time series models. To implement the limiting results, we apply the parametric bootstrap, the methodology and thoeretical justification of which is given in Section 4. We then apply the result to ARMA and GARCH models in Sections 5 and 6 and illustrate with simulation studies. A simulated example where the data does not conform with the model is demonstrated in Section 7.
2 Distance covariance
Let and be two random vectors, potentially of different dimensions. Let
denote the joint and marginal characteristic functions of
. We know thatThe distance covariance between and is defined as
where is a suitable measure on . In order to ensure that is welldefined, one of the following conditions is assumed to be satisfied (Davis et al., 2018):

is a finite measure;

is an infinite measure such that
If has a positive Lebesgue density on , then and are independent if and only if .
For a stationary series , the autodistance covariance (ADCV) is given by
Given observations , the ADCV can be estimated by its sample version
where
If we assume that and is symmetric about the origin, then under the conditions where exists, is computable in an alternative expression similar to a statistic, see Section 2.2 of Davis et al. (2018) for details. It can be shown that if the ’s are iid, the process converges weakly,
(2.1) 
for any compact set , and
where is a zeromean Gaussian process with covariance structure
The concept of distance covariance was first proposed by Feuerverger (1993) in the bivariate case and later popularized by Székely et al. (2007). The idea of ADCV was first introduced by Zhou (2012). For distance covariance in the time series context, we refer to Davis et al. (2018) for theory in a general framework.
Most literature on distance covariance focus on the specific weight measure with density proportional to . This distance covariance has the advantage of being scale and rotational invariant, but imposes moment constraints on the variables under consideration. In our case, as will be shown in Section 3, this measure may not work when applied to the residuals (see also Section 4 of Davis et al. (2018) for a counterexample). To avoid this difficulty, we assume a finite measure for . In this case has the computable form
where
is the Fourier transform with respect to
.It should be noted that the concept of distance covariance is closely related to the HilbertSchmidt Independence Criterion (HSIC), see Gretton et al. (2005). For example, the distance covariance with Gaussian measure coincides with the HSIC with a Gaussian kernel. In recent work, Wang et al. (2018) use HSIC for testing the cross dependence between two time series.
3 General result
Let be observations from a stationary time series generated from (1.1) with . Let be the estimated residual calculated through (1.4). In this section, we examine the ADCV of the residuals
where
To provide the limiting result for , we require the following assumptions.

Let be the algebra generated by . We assume that the parameter estimate is of the form
(3.1) where is a vectorvalued function of the infinite sequence such that
(3.2) This representation can be readily found in most likelihoodbased estimators, for example, the YuleWalker estimator for AR processes, quasiMLE for GARCH processes, etc. In these cases
can be taken as the likelihood score function. By the martingale central limit theorem, (
3.1) and (3.2) imply thatfor a random Gaussian vector .

Assume that the function in the invertible representation (1.2) is continuously differentiable, and writing
(3.3) we assume

Assume that , the estimated residuals based on the finite sequence of observations, is close to , the fitted residuals based on the infinite sequence, such that
Theorem 3.1.
Let be a sequence of observations generated from the causal and invertible time series model (1.1) and (1.2) with . Let be an estimator of and let be the estimated residuals calculated through (1.4) satisfying conditions (M1)–(M3). Furthermore assume that the weight measure satisfies
(3.4) 
Then
where is the limiting distribution for , the ADCV based on the iid innovations , and the correction term is given by
(3.5) 
with being the limit distribution of and as defined in (3.3).
The proof of the theorem is provided in Appendix A.
Remark 3.2.
Distance correlation, analogous to linear correlation, is the normalized version of distance covariance, defined as
The autodistance correlation function (ADCF) of a stationary series at lag is given by
and its sample version can defined similarly. It can be shown that the ADCF for the residuals from an AR() model has the limiting distribution (Davis et al., 2018):
(3.6) 
and the result can be easily generalized to other models. In the examples in Sections 5 and 6, we shall use ADCF in place of ADCV.
4 Parametric bootstrap
The limit in (3.6) is not distributionfree and is generally intractable. In order to use the result, we propose to approximate the limit through the parametric bootstrap described below.
Given observations , let be the parameter estimate and be the estimated residuals. A set of bootstrapped residuals can be obtained as follows:

Let be the meancorrected empirical distribution of ;

Generate from the time series model with parameter value and innovation sequence generated from ;

Refit the time series model. Obtained the parameter estimate and the estimated residuals .
Let be the ADCV calculated from the bootstrapped residuals . In Theorem 4.2 below, we show that when the sample size is large, the empirical distribution of forms a good representation of the limiting distribution of , the ADCV of the actual fitted residuals. Before stating the theorem, we first state the relevant conditions. We denote by and
the probability and expectation conditional on the observations
.
Assume that the function in the invertible representation (1.2) is continuously differentiable and
where
(4.1) 
Assume that the estimated residuals based on the finite sequence of observations, , is close to the fitted residuals based on the infinite sequence, , such that for any ,
Remark 4.1.
5 Example: ARMA(,)
Consider the causal, invertible ARMA() process that follows the recursion,
(5.1) 
where is the vector of parameters and is iid with mean 0 and variance . Denote the AR and MA polynomials by and , and let be the backward operator such that
Then the recursion (5.1) can be represented by
It follows from invertibility that has the power series expansion
where , and
Given an estimate of the parameters , the residuals based on the infinite sequence are given by
Based on the observed data , the estimated residuals are
(5.2) 
One choice for is the pseudoMLE based on Gaussian likelihood
where and the covariance is independent of . The pseudoMLE and are taken to be the values that maximize . It can be shown that is consistent and asymptotically normal even for nonGaussian (Brockwell and Davis, 1991).
We have the following result for the ADCV of ARMA residuals.
Corollary 5.1.
Remark 5.2.
In the case where the distribution of is in the domain of attraction of an stable law with , and the parameter estimator has convergence rate faster than , i.e.,
(Davis, 1996), the ADCV of the residuals has limit
where the correction term disappears. For a proof in the AR() case, see Theorem 4.2 of Davis et al. (2018).
5.1 Simulation
We generate time series of length from an ARMA(2,2) model with standard normal innovations and parameter values
For each simulation, an ARMA(2,2) model is fitted to the data. In Figure 1, we compare the empirical and quantiles for the ADCF of

iid innovations from 1000 independent simulations;

estimated residuals from 1000 independent simulations of ;

estimated residuals through 1000 independent parametric bootstrap samples from one realization of .
In order to satisfy condition (3.4), the ADCFs are evaluated using the Gaussian weight measure . Confirming the results in Theorem 3.1 and Corollary 5.1, the simulated quantiles of differ significantly from that of , especially when is small. Given one realization of the time series, the quantiles estimated by parametric boostrap correctly capture this effect.
6 Example: GARCH(,)
In this section, we consider the GARCH(,) model,
where the ’s are iid innovations with mean 0 and variance 1 and
(6.1) 
Let denote the parameter vector. We write the conditional variance to denote it as a function of .
Iterating the recursion in (6.1) gives
for suitably defined functions ’s, see Berkes et al. (2003). Given an estimator , an estimator for based on the infinite sequence can be written as
and the unobserved residuals are given by
In practice, can be approximated by the truncated version
and the estimated residual is given by
(6.2) 
Define the parameter space by
for some , and , and assume the following conditions:

The true value lies in the interior of .

For some ,

For some ,

The GARCH() representation is minimal, i.e., the polynomials and do not have common roots.
Given observations , Berkes et al. (2003) proposed a quasimaximum likelihood estimator for given by
where
Provided that (Q1)–(Q4) are satisfied, the quasiMLE is consistent and asymptotically normal.
Consider the estimated residuals for the GARCH(,) model based on . We have the following result.
Corollary 6.1.
6.1 Simulation
We generate time series of length from a GARCH(1,1) model with parameter values
For each simulation, a GARCH(1,1) model is fitted to the data. In Figure 2, we compare the empirical and quantiles for the ADCF of

iid innovations from 1000 independent simulations;

estimated residuals from 1000 independent simulations of ;

estimated residuals through 1000 independent parametric bootstrap samples from one realization of .
Again the ADCFs are based on the Gaussian weight measure . The difference between the quantiles of and can be observed. For this GARCH model, the correction has the opposite effect than in the previous ARMA exaple – the ADCF for residuals are larger than that for iid variables, especially for small lags.
7 Example: Noncausal AR(1)
In this section, we consider an example where the model is misspecified. We generate time series of length from a noncausal AR(1) model
with and ’s from a
distribution with 2.5 degrees of freedom. Then we fit a causal AR(1) model, where
, to the data and obtain the corresponding residuals. Again we use the Gaussian weight measure when evaluating the ADCF of the residuals. In Figure 3, the and ADCF quantiles are plotted for:
estimated residuals from 1000 independent simulations of ;

estimated residuals through 1000 independent parametric bootstrap samples from one realization of .
The ADCFs of the bootstrapped residuals provide an approximation for the limiting distribution of the ADCF of the residuals given the model is correctly specified. In this case, the ADCFs of the estimated residuals significantly differ from the quantiles of that of the bootstrapped residuals. This indicates the time series does not come from the assumed causal AR model.
8 Conclusion
In this paper, we propose a goodnessoffit procedure for time series models by examining the serial dependence of estimated residuals. The dependence is measured using the autodistance covariance function (ADCV) and its limiting behavior is derived for general classes of time series models. We show that the limiting law often differs from that of the ADCV based on iid innovations by a correction term. This indicates that adjustments should be made when testing the goodnessoffit of the model. We illustrate the result on simulated examples of ARMA and GARCH processes and discover that the adjustments could be in either direction – the quantiles of ADCV for residuals could be larger or smaller than that for iid innovations. We also studied an example when a noncausal AR process was incorrectly fitted with a causal model and showed that ADCV correctly detected model misspecification when applied to the residuals.
References
 Berkes et al. (2003) I. Berkes, L. Horváth, and P. Kokoszka. GARCH processes: structure and estimation. Bernoulli, 9(2):201–227, 2003.
 Billingsley (1999) P. Billingsley. Convergence of Probability Measures. Wiley, New York., 2nd edition, 1999.
 Brockwell and Davis (1991) P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer, New York., 1991.
 Davis (1996) R.A. Davis. GaussNewton and estimation for ARMA proces with infinite variance. Stoch. Process. Appl., 63:75–95, 1996.
 Davis et al. (2018) R.A. Davis, M. Matsui, T. Mikosch, and P. Wan. Applications of distance covariance to time series. Bernoulli, 24(4A):3087–3116, 2018.
 Durrett (2010) R.T. Durrett. Probability: Theory and Examples. Cambridge University Press, 4th edition, 2010.
 Feuerverger (1993) A. Feuerverger. A consistent test for bivariate dependence. Internat. Statis. Rev., 61(3):419–433, 1993.
 Gretton et al. (2005) A. Gretton, O. Bousquet, A. Smola, and Schölkopf B. Measuring statistical dependence with HilbertSchmidt norms. In Sanjay Jain, Hans Ulrich Simon, and Etsuji Tomita, editors, Algorithmic Learning Theory, pages 63–77, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
 Kulperger and Yu (2005) R. Kulperger and H. Yu. High moment partial sum processes of residuals in garch models and their applications. Ann. Statist., 33(5):2395–2422, 2005.
 Leucht and Neumann (2009) A. Leucht and M.H. Neumann. Consistency of general bootstrap methods for degenerate type and type statistics. J. Multiv. Anal., 100:1622–1633, 2009.
 Scott (1973) D.J. Scott. Central limit theorems for martingales and for processes with stationary increments using a skorokhod representation approach. Adv. Appl. Probab., 5(1):119–137, 1973.
 Székely et al. (2007) G.J. Székely, M.L. Rizzo, and N.K. Bakirov. Measuring and testing dependence by correlation of distances. Ann. Statist., 35:2769–2794, 2007.
 Wang et al. (2018) G. Wang, W.K. Li, and K. Zhu. New hsicbased tests for independence between two stationary multivariate time series. arXiv:1804.09866, 2018.
 Zhou (2012) Z. Zhou. Measuring nonlinear dependence in timeseries, a distance correlation approach. J. Time Ser. Anal., 33:438–457, 2012.
Appendix A Proof of Theorem 3.1
Proof.
The proof proceeds in the following steps with the aid of Propositions A.1, A.2 and A.3. Write
where
and
We first show in Proposition A.1 that
where is any compact set in . This implies
For , define the compact set
It follows from the continuous mapping theorem that
To complete the proof, it remains to justify that we can take . For this it suffices to show that for any ,
and
∎
Proof.
We first consider the marginal convergence of . Denote
then
(A.1)  
Comments
There are no comments yet.