Goodness-of-Fit Testing for Time Series Models via Distance Covariance

In many statistical modeling frameworks, goodness-of-fit tests are typically administered to the estimated residuals. In the time series setting, whiteness of the residuals is assessed using the sample autocorrelation function. For many time series models, especially those used for financial time series, the key assumption on the residuals is that they are in fact independent and not just uncorrelated. In this paper, we apply the auto-distance covariance function (ADCV) to evaluate the serial dependence of the estimated residuals. Distance covariance can discriminate between dependence and independence of two random vectors. The limit behavior of the test statistic based on the ADCV is derived for a general class of time series models. One of the key aspects in this theory is adjusting for the dependence that arises due to parameter estimation. This adjustment has essentially the same form regardless of the model specification. We illustrate the results in simulated examples.



There are no comments yet.


page 1

page 2

page 3

page 4


A Consistent Independence Test for Multivariate Time-Series

A fundamental problem in statistical data analysis is testing whether tw...

Statistical inference for ARTFIMA time series with stable innovations

Autoregressive tempered fractionally integrated moving average with stab...

A Shift Test for Independence in Generic Time Series

We describe a family of conservative statistical tests for independence ...

Distance correlation for long-range dependent time series

We apply the concept of distance correlation for testing independence of...

Implicit Copulas: An Overview

Implicit copulas are the most common copula choice for modeling dependen...

Limit Theorems for Factor Models

This paper establishes some asymptotic results such as central limit the...

Sliced Average Variance Estimation for Multivariate Time Series

Supervised dimension reduction for time series is challenging as there m...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction


be a stationary time series of random variables with finite mean and variance. Given consecutive observations of this time series

, we consider testing the plausibility that the data were generated from a parametric model. We consider causal models of the form


where the ’s are independent and identically distributed (iid) with mean zero and finite variance, denotes the sequence , and is the parameter vector. Assume further that the model (1.1) has the invertible representation


The objective of this paper is to provide a validity check of the model (1.1) by testing the estimated residuals for independence.

Given observations and , an estimator for , the innovations can be approximated by the residuals based on the infinite sequence , defined as


Since we do not observe for , we instead use the estimated residuals


where is the infinite sequence with , and for . If the time series is stationary and ergodic, the influence of in (1.3) becomes negligible for large and and become close.

It is general practice to inspect for goodness-of-fit of the time series model. If (1.1) correctly describes the generating mechanism of , one would expect to behave similarly as . However, the sequence is not iid since they are functions of , hence certain properties of can differ from that of , which in turn may impact sample statistics such as the sample autocorrelation of the residuals. This has been noted for specific time series models in the literature. For example, for the ARMA model, corrections have been made for statistics based on the residuals, see Section 9.4 of Brockwell and Davis (1991)

. For heteroscedastic GARCH models, the moment sum process of the residuals is notedly different from that of iid innovations, see

Kulperger and Yu (2005). Though should be nearly independent under the true model assumption, the discrepancy between and should be taken into account when designing a goodness-of-fit test.

In this paper, we characterize the serial dependence of the residuals using distance covariance. Distance covariance is a useful dependence measure with the ability to detect both linear and nonlinear dependence. It is zero if and only if independence occurs. We study the auto-distance covariance function (ADCV) of the residuals and derive its limit when the model is correctly specified. We show that the limiting distribution of the ADCV of differs from that of its iid counterpart and quantify the difference. This is an extension of Section 4 of Davis et al. (2018) which considered this problem for AR processes.

The remainder of the paper is structured as follows. An introduction to distance correlation and ADCV along with some historical remarks are given in Section 2. In Section 3, we provide the limit result for the ADCV of the residuals for a general class of time series models. To implement the limiting results, we apply the parametric bootstrap, the methodology and thoeretical justification of which is given in Section 4. We then apply the result to ARMA and GARCH models in Sections 5 and 6 and illustrate with simulation studies. A simulated example where the data does not conform with the model is demonstrated in Section 7.

2 Distance covariance

Let and be two random vectors, potentially of different dimensions. Let

denote the joint and marginal characteristic functions of

. We know that

The distance covariance between and is defined as

where is a suitable measure on . In order to ensure that is well-defined, one of the following conditions is assumed to be satisfied (Davis et al., 2018):

  1. is a finite measure;

  2. is an infinite measure such that

If has a positive Lebesgue density on , then and are independent if and only if .

For a stationary series , the auto-distance covariance (ADCV) is given by

Given observations , the ADCV can be estimated by its sample version


If we assume that and is symmetric about the origin, then under the conditions where exists, is computable in an alternative expression similar to a -statistic, see Section 2.2 of Davis et al. (2018) for details. It can be shown that if the ’s are iid, the process converges weakly,


for any compact set , and

where is a zero-mean Gaussian process with covariance structure

The concept of distance covariance was first proposed by Feuerverger (1993) in the bivariate case and later popularized by Székely et al. (2007). The idea of ADCV was first introduced by Zhou (2012). For distance covariance in the time series context, we refer to Davis et al. (2018) for theory in a general framework.

Most literature on distance covariance focus on the specific weight measure with density proportional to . This distance covariance has the advantage of being scale and rotational invariant, but imposes moment constraints on the variables under consideration. In our case, as will be shown in Section 3, this measure may not work when applied to the residuals (see also Section 4 of Davis et al. (2018) for a counterexample). To avoid this difficulty, we assume a finite measure for . In this case has the computable form


is the Fourier transform with respect to


It should be noted that the concept of distance covariance is closely related to the Hilbert-Schmidt Independence Criterion (HSIC), see Gretton et al. (2005). For example, the distance covariance with Gaussian measure coincides with the HSIC with a Gaussian kernel. In recent work, Wang et al. (2018) use HSIC for testing the cross dependence between two time series.

3 General result

Let be observations from a stationary time series generated from (1.1) with . Let be the estimated residual calculated through (1.4). In this section, we examine the ADCV of the residuals


To provide the limiting result for , we require the following assumptions.

  1. Let be the -algebra generated by . We assume that the parameter estimate is of the form


    where is a vector-valued function of the infinite sequence such that


    This representation can be readily found in most likelihood-based estimators, for example, the Yule-Walker estimator for AR processes, quasi-MLE for GARCH processes, etc. In these cases

    can be taken as the likelihood score function. By the martingale central limit theorem, (

    3.1) and (3.2) imply that

    for a random Gaussian vector .

  2. Assume that the function in the invertible representation (1.2) is continuously differentiable, and writing


    we assume

  3. Assume that , the estimated residuals based on the finite sequence of observations, is close to , the fitted residuals based on the infinite sequence, such that

Theorem 3.1.

Let be a sequence of observations generated from the causal and invertible time series model (1.1) and (1.2) with . Let be an estimator of and let be the estimated residuals calculated through (1.4) satisfying conditions (M1)(M3). Furthermore assume that the weight measure satisfies



where is the limiting distribution for , the ADCV based on the iid innovations , and the correction term is given by


with being the limit distribution of and as defined in (3.3).

The proof of the theorem is provided in Appendix A.

Remark 3.2.

Distance correlation, analogous to linear correlation, is the normalized version of distance covariance, defined as

The auto-distance correlation function (ADCF) of a stationary series at lag is given by

and its sample version can defined similarly. It can be shown that the ADCF for the residuals from an AR() model has the limiting distribution (Davis et al., 2018):


and the result can be easily generalized to other models. In the examples in Sections 5 and 6, we shall use ADCF in place of ADCV.

4 Parametric bootstrap

The limit in (3.6) is not distribution-free and is generally intractable. In order to use the result, we propose to approximate the limit through the parametric bootstrap described below.

Given observations , let be the parameter estimate and be the estimated residuals. A set of bootstrapped residuals can be obtained as follows:

  1. Let be the mean-corrected empirical distribution of ;

  2. Generate from the time series model with parameter value and innovation sequence generated from ;

  3. Re-fit the time series model. Obtained the parameter estimate and the estimated residuals .

Let be the ADCV calculated from the bootstrapped residuals . In Theorem 4.2 below, we show that when the sample size is large, the empirical distribution of forms a good representation of the limiting distribution of , the ADCV of the actual fitted residuals. Before stating the theorem, we first state the relevant conditions. We denote by and

the probability and expectation conditional on the observations


  1. Let be the -algebra generated by and , respectively. We assume that condition (M1) holds, i.e., (3.1) and (3.2) hold. In addition, as , for any ,

    for some , and

  2. Assume that the function in the invertible representation (1.2) is continuously differentiable and


  3. Assume that the estimated residuals based on the finite sequence of observations, , is close to the fitted residuals based on the infinite sequence, , such that for any ,

Remark 4.1.

Condition (M1’) ensures that provides a good approximation to , the limit of . These conditions are standard for the martingale central limit theorem, see, for example, Scott (1973). Conditions (M2’) and (M3’) are parallel arguments to conditions (M2) and (M3).

Theorem 4.2.

Assuming conditions (M1’), (M2’) and (M3’) hold, the ADCV of the bootstrapped residuals satisfies

5 Example: ARMA(,)

Consider the causal, invertible ARMA() process that follows the recursion,


where is the vector of parameters and is iid with mean 0 and variance . Denote the AR and MA polynomials by and , and let be the backward operator such that

Then the recursion (5.1) can be represented by

It follows from invertibility that has the power series expansion

where , and

Given an estimate of the parameters , the residuals based on the infinite sequence are given by

Based on the observed data , the estimated residuals are


One choice for is the pseudo-MLE based on Gaussian likelihood

where and the covariance is independent of . The pseudo-MLE and are taken to be the values that maximize . It can be shown that is consistent and asymptotically normal even for non-Gaussian (Brockwell and Davis, 1991).

We have the following result for the ADCV of ARMA residuals.

Corollary 5.1.

Let be observations from a causal and invertible ARMA(,) time series and be the estimated residuals defined in (5.2) using the pseudo-MLE . Assume that satisfies (3.4), then

where is a joint Gaussian process defined on with as specified in (2.1) and in (3.5).

The proof of Corollary 5.1 is given in Appendix C.

Remark 5.2.

In the case where the distribution of is in the domain of attraction of an -stable law with , and the parameter estimator has convergence rate faster than , i.e.,

(Davis, 1996), the ADCV of the residuals has limit

where the correction term disappears. For a proof in the AR() case, see Theorem 4.2 of Davis et al. (2018).

5.1 Simulation

We generate time series of length from an ARMA(2,2) model with standard normal innovations and parameter values

For each simulation, an ARMA(2,2) model is fitted to the data. In Figure 1, we compare the empirical and quantiles for the ADCF of

  1. iid innovations from 1000 independent simulations;

  2. estimated residuals from 1000 independent simulations of ;

  3. estimated residuals through 1000 independent parametric bootstrap samples from one realization of .

In order to satisfy condition (3.4), the ADCFs are evaluated using the Gaussian weight measure . Confirming the results in Theorem 3.1 and Corollary 5.1, the simulated quantiles of differ significantly from that of , especially when is small. Given one realization of the time series, the quantiles estimated by parametric boostrap correctly capture this effect.

Figure 1: Empirical and quantiles of the ADCF for a) iid innovations; b) estimated residuals; c) bootstrapped residuals; from a ARMA(2,2) model.

6 Example: GARCH(,)

In this section, we consider the GARCH(,) model,

where the ’s are iid innovations with mean 0 and variance 1 and


Let denote the parameter vector. We write the conditional variance to denote it as a function of .

Iterating the recursion in (6.1) gives

for suitably defined functions ’s, see Berkes et al. (2003). Given an estimator , an estimator for based on the infinite sequence can be written as

and the unobserved residuals are given by

In practice, can be approximated by the truncated version

and the estimated residual is given by


Define the parameter space by

for some , and , and assume the following conditions:

  1. The true value lies in the interior of .

  2. For some ,

  3. For some ,

  4. The GARCH() representation is minimal, i.e., the polynomials and do not have common roots.

Given observations , Berkes et al. (2003) proposed a quasi-maximum likelihood estimator for given by


Provided that (Q1)(Q4) are satisfied, the quasi-MLE is consistent and asymptotically normal.

Consider the estimated residuals for the GARCH(,) model based on . We have the following result.

Corollary 6.1.

Let be observations from a GARCH(,) time series and be the estimated residuals defined in (6.2) based on the quasi-MLE . Assume that (Q1)(Q4) holds and that satisfies (3.4), we have

where is a joint Gaussian process defined on with as specified in (2.1) and in (3.5).

The proof of Corollary 6.1 is given in Appendix D.

6.1 Simulation

We generate time series of length from a GARCH(1,1) model with parameter values

For each simulation, a GARCH(1,1) model is fitted to the data. In Figure 2, we compare the empirical and quantiles for the ADCF of

  1. iid innovations from 1000 independent simulations;

  2. estimated residuals from 1000 independent simulations of ;

  3. estimated residuals through 1000 independent parametric bootstrap samples from one realization of .

Again the ADCFs are based on the Gaussian weight measure . The difference between the quantiles of and can be observed. For this GARCH model, the correction has the opposite effect than in the previous ARMA exaple – the ADCF for residuals are larger than that for iid variables, especially for small lags.

Figure 2: Empirical and quantiles of the ADCF for a) iid innovations; b) estimated residuals; c) bootstrapped residuals; from a GARCH(1,1) model.

7 Example: Non-causal AR(1)

In this section, we consider an example where the model is misspecified. We generate time series of length from a non-causal AR(1) model

with and ’s from a

-distribution with 2.5 degrees of freedom. Then we fit a causal AR(1) model, where

, to the data and obtain the corresponding residuals. Again we use the Gaussian weight measure when evaluating the ADCF of the residuals. In Figure 3, the and ADCF quantiles are plotted for:

  1. estimated residuals from 1000 independent simulations of ;

  2. estimated residuals through 1000 independent parametric bootstrap samples from one realization of .

The ADCFs of the bootstrapped residuals provide an approximation for the limiting distribution of the ADCF of the residuals given the model is correctly specified. In this case, the ADCFs of the estimated residuals significantly differ from the quantiles of that of the bootstrapped residuals. This indicates the time series does not come from the assumed causal AR model.

Figure 3: Empirical and quantiles of the ADCF for a) iid innovations; b) bootstrapped residuals; from non-causal AR(1) data fitted with a causal AR(1) model.

8 Conclusion

In this paper, we propose a goodness-of-fit procedure for time series models by examining the serial dependence of estimated residuals. The dependence is measured using the auto-distance covariance function (ADCV) and its limiting behavior is derived for general classes of time series models. We show that the limiting law often differs from that of the ADCV based on iid innovations by a correction term. This indicates that adjustments should be made when testing the goodness-of-fit of the model. We illustrate the result on simulated examples of ARMA and GARCH processes and discover that the adjustments could be in either direction – the quantiles of ADCV for residuals could be larger or smaller than that for iid innovations. We also studied an example when a non-causal AR process was incorrectly fitted with a causal model and showed that ADCV correctly detected model misspecification when applied to the residuals.


  • Berkes et al. (2003) I. Berkes, L. Horváth, and P. Kokoszka. GARCH processes: structure and estimation. Bernoulli, 9(2):201–227, 2003.
  • Billingsley (1999) P. Billingsley. Convergence of Probability Measures. Wiley, New York., 2nd edition, 1999.
  • Brockwell and Davis (1991) P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer, New York., 1991.
  • Davis (1996) R.A. Davis. Gauss-Newton and -estimation for ARMA proces with infinite variance. Stoch. Process. Appl., 63:75–95, 1996.
  • Davis et al. (2018) R.A. Davis, M. Matsui, T. Mikosch, and P. Wan. Applications of distance covariance to time series. Bernoulli, 24(4A):3087–3116, 2018.
  • Durrett (2010) R.T. Durrett. Probability: Theory and Examples. Cambridge University Press, 4th edition, 2010.
  • Feuerverger (1993) A. Feuerverger. A consistent test for bivariate dependence. Internat. Statis. Rev., 61(3):419–433, 1993.
  • Gretton et al. (2005) A. Gretton, O. Bousquet, A. Smola, and Schölkopf B. Measuring statistical dependence with Hilbert-Schmidt norms. In Sanjay Jain, Hans Ulrich Simon, and Etsuji Tomita, editors, Algorithmic Learning Theory, pages 63–77, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
  • Kulperger and Yu (2005) R. Kulperger and H. Yu. High moment partial sum processes of residuals in garch models and their applications. Ann. Statist., 33(5):2395–2422, 2005.
  • Leucht and Neumann (2009) A. Leucht and M.H. Neumann. Consistency of general bootstrap methods for degenerate -type and -type statistics. J. Multiv. Anal., 100:1622–1633, 2009.
  • Scott (1973) D.J. Scott. Central limit theorems for martingales and for processes with stationary increments using a skorokhod representation approach. Adv. Appl. Probab., 5(1):119–137, 1973.
  • Székely et al. (2007) G.J. Székely, M.L. Rizzo, and N.K. Bakirov. Measuring and testing dependence by correlation of distances. Ann. Statist., 35:2769–2794, 2007.
  • Wang et al. (2018) G. Wang, W.K. Li, and K. Zhu. New hsic-based tests for independence between two stationary multivariate time series. arXiv:1804.09866, 2018.
  • Zhou (2012) Z. Zhou. Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. Anal., 33:438–457, 2012.

Appendix A Proof of Theorem 3.1


The proof proceeds in the following steps with the aid of Propositions A.1, A.2 and A.3. Write



We first show in Proposition A.1 that

where is any compact set in . This implies

For , define the compact set

It follows from the continuous mapping theorem that

To complete the proof, it remains to justify that we can take . For this it suffices to show that for any ,


These are shown in Propositions A.2 and A.3, respectively.

Proposition A.1.

Given the conditions (M1)(M3),

for any compact .


We first consider the marginal convergence of . Denote