Model-Free Tests for Series Correlation in Multivariate Linear Regression

01/17/2019 ∙ by Yanqing Yin, et al. ∙ 0

Testing for series correlation among error terms is a basic problem in linear regression model diagnostics. The famous Durbin-Watson test and Durbin's h-test rely on certain model assumptions about the response and regressor variables. The present paper proposes simple tests for series correlation that are applicable in both fixed and random design linear regression models. The test statistics are based on the regression residuals and design matrix. The test procedures are robust under different distributions of random errors. The asymptotic distributions of the proposed statistics are derived via a newly established joint central limit theorem for several general quadratic forms and the delta method. Good performance of the proposed tests is demonstrated by simulation results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Linear regression is an important topic in statistics and has been found to be useful in almost all aspects of data science, especially in business and economics statistics and biostatistics. Consider the following multivariate linear regression model

(1.1)

where

is the response variable,

is a

-dimensional vector of regressors,

is a -dimensional regression coefficient vector and is random errors with zero mean. Suppose we obtain samples from this model, that is, with design matrix , where for ,

. The first task in a regression problem is to make statistical inference about the regression coefficient vector. By applying the ordinary least squares (OLS) method, we obtain the estimate

for coefficient vector . In most applications of linear regression models, we need the assumption that the random errors are uncorrelated and homoscedastic. That is to say, we assume

where are unknown. With this assumption, the Gauss-Markov theorem states that the ordinary least squares estimate (OLSE)

is the best linear unbiased estimator (BLUE). When this assumption does not hold, we suffer from a loss of efficiency and, even worse, make wrong inferences in using OLS. For example, positive serial correlation in the regression error terms will typically lead to artificially small standard errors for the regression coefficient when we apply the classic linear regression method, which will cause the estimated t-statistic to be inflated, indicating significance even when there is in fact none. Therefore, tests for heteroscedasticity and series correlation are important when applying linear regression.

For detecting heteroscedasticity, in one of the most cited papers in econometrics, White [White(1980)] proposed a test based on comparing the Huber-White covariance estimator to the usual covariance estimator under homoscedasticity. Many other researchers have considered this problem, for example, Breusch and Pagan [Breusch and Pagan(1979)], Dette and Munk [Dette and Munk(1998)], Glejser [Glejser(1969)], Harrison and McCabe [Harrison and McCabe(1979)], Cook and Weisberg [Cook and Weisberg(1983)], and Azzalini and Bowman[Azzalini and Bowman(1993)]. Recently, Li and Yao [Li and Yao(2015)] and Bai, Pan and Yin [Bai et al.(2018)Bai, Pan, and Yin] proposed tests for heteroscedasticity that are valid in both low- and high-dimensional regressions. Their tests were shown by simulations to perform better than some classic tests.

The most famous test for series correlation, the Durbin-Watson test, was proposed in [Durbin and Watson(1950), Durbin and Watson(1951), Durbin and Watson(1971)]. The Durbin-Watson test statistic is based on the residuals from linear regression. The researchers considered the statistic

whose small-sample distribution was derived by John von Neumann. In the original papers, Durbin and Watson investigated the distribution of this statistic under the classic independent framework, described the test procedures and provided tables of the bounds of significance. However, the asymptotic results were derived under the normality assumption on the error term, and as noted by Nerlove and Wallis [Nerlove and Wallis(1966)]

, although the Durbin-Watson test appeared to work well in an independent observations framework, it may be asymptotically biased and lead to inadequate conclusions for linear regression models containing lagged dependent random variables. New alternative test procedures, for instance, Durbin’s h-test and t-test

[Durbin(1970)], were proposed to address this problem; see also Inder [Inder(1986)], King and Wu [King and Wu(1991)], Stocker [Stocker(2007)], Bercu and Proïa [Bercu and Proia(2013)], Gençay and Signori [Gençay and Signori(2015)] and Li and Gençay [Li and Gençay(2017)]

and references therein. However, all these tests were proposed under some model assumptions on the regressors and/or the response variable. Moreover, Durbin’s h-test requires a Gaussian distribution of the error term. Thus, some common models are excluded. In fact, since it is difficult to assess whether the regressors and/or the response are lag dependent, model-free tests for the regressors and response variable appear to be appropriate.

The present paper proposes a simple test procedure without assumptions on the response variable and regressors that is valid in both low- and high-dimensional multivariate linear regression. The main idea, which is simple but proves to be useful, is to express the mean and variance of the test statistic by making use of the residual maker matrix. In addition to a general joint central limit theorem for several quadratic forms, which is proved in this paper and may have its own interest, we consider a Box-Pierce-type test for series correlation. Monte Carlo simulations show that our test procedures perform well in situations where some classic test procedures are inapplicable.

2. Test for series correlation in linear regression model

2.1. Notation

Let be the design matrix, and let be the residual maker matrix, where is the hat matrix (also known as the projection matrix). We assume that the noise vector where is an -dimensional random vector whose entries

are independent with zero means, unit variances and the same finite fourth-order moments

, and is an n-dimensional nonnegative definite nonrandom matrix with bounded spectral norm. Then, the OLS residuals are . We note that we will use to indicate the Hadamard product of two matrices in the rest of this paper.

2.2. Test for a given order series correlation

To test for a given order series correlation, for any number , denote

where with

First, for we have

(2.1)

Denote and , and set we then have, for

(2.2)

Note that We want to test the hypothesis for

against

Under the null hypothesis, due to (

2.1) and (2.2), we obtain

(2.3)

and

(2.4)

Specifically, we have and

(2.5)

The validity of our test procedure requires the following mild assumptions.

(1): Assumption on and :

The number of regressors and the sample size satisfy that as .

(2): Assumption on errors:

The fourth-order cumulant of the error distribution .

Assumption excludes the rare case where the random errors are drawn from a two-point distribution with the same masses at and . However, if this situation occurs, our test remains valid if the design matrix satisfies the mild condition that

These assumptions ensure that has the same order as as , thus satisfying the condition assumed in Theorem 4.1.

Define

By applying Theorem 4.1 presented in Section 4, we obtain that for

Then, by the delta method, we obtain, as ,

where and

(2.6)

We reject in favor of if a large is observed.

2.3. A portmanteau test for series correlation

In time series analysis, the Box-Pierce test proposed in [Box and Pierce(1970)] and the Ljung-Box statistic proposed in [Ljung and Box(1978)] are two portmanteau tests of whether any of a group of autocorrelations of a time series are different from zero. For a linear regression model, consider the following hypothesis

against

Applying Theorem 4.1

and the delta method, we shall now consider the following asymptotically standard normally distributed statistic

as where and with

and

Then, we reject in favor of if is large.

2.4. Discussion of the statistics

In the present subsection, we discuss the asymptotic parameters of the two proposed statistics.

If the entries in design matrix are assumed to be i.i.d. standard normal, then we know that as , the diagonal entries in the symmetric and idempotent matrices and are of constant order while the off-diagonal entries are of order . Then, the order of for a given is at most since it is exactly the summation of the off-diagonal entries of . Thus, elementary analysis shows that .

For a fixed design or a more general random design, it become almost impossible to study matrices and , except for some of the elementary properties. Thus, for the purpose of obtaining an accurate statistical inference, we suggest the use of the original parameters since we have little information on the distribution of the regressors in a fixed design, and the calculation of those parameters is not excessively complex.

3. Simulation studies

In this section, Monte Carlo simulations are conducted to investigate the performance of our proposed tests.

3.1. Performance of test for first-order series correlation

First, we consider the test for first-order series correlation of the error terms in multivariate linear regression model (1.1). Note that although our theory results were derived by treating the design matrix as a constant matrix, we also need to obtain a design matrix under a certain random model in the simulations. We thus consider the situation where the regressors are lagged dependent. Formally, for a given , we set

where and are independently drawn from N(0,1). While

are independently chosen from a Student’s t-distribution with 5 degrees of freedom. The random errors

obey (1) the normal distribution N(0,1) and (2) the uniform distribution U(-1,1). The significant level is set to

Table 1 and Table 2 show the empirical size of our test (denoted as ) for different under the two error distributions. To investigate the power of our test, we randomly choose a and consider the following AR(1) model:

where are independently drawn from (1) N(0,1) and (2) U(-1,1). Tables 3 and 4 show the empirical power of our proposed test for different under the two error distributions.

These simulation results show that our test always has good size and power when is large and is thus applicable under the framework that as .

f FDWT f FDWT
2,32 1 0.0486 8,32 2 0.0428
8,32 4 0.0410 8,32 8 0.0434
16,64 4 0.0446 16,64 12 0.0463
32,64 12 0.0420 32,64 24 0.0414
32,128 12 0.0470 32,128 24 0.0478
64,128 12 0.0479 64,128 36 0.0430
128,256 12 0.0509 128,256 24 0.0486
128,256 64 0.0504 128,256 128 0.0422
128,512 24 0.0519 128,512 64 0.0496
128,512 96 0.0487 128,512 128 0.0497
256,512 64 0.0469 256,512 96 0.0492
256,512 144 0.0472 256,512 256 0.0486
256,1028 64 0.0457 256,1028 96 0.0498
256,1028 144 0.0473 256,1028 256 0.0487
512,1028 12 0.0463 512,1028 96 0.0506
512,1028 144 0.0520 512,1028 256 0.0478
512,1028 288 0.0460 512,1028 314 0.0442
512,1028 440 0.0438 512,1028 512 0.0443
Table 1. Empirical size under Gaussian error assumption
f FDWT f FDWT
2,32 1 0.0410 2,32 2 0.0421
8,32 4 0.0414 8,32 8 0.0468
16,64 4 0.0467 16,64 12 0.0450
32,64 12 0.0450 32,64 24 0.0419
32,128 12 0.0456 32,128 24 0.0458
64,128 12 0.0479 64,128 36 0.0460
128,256 12 0.0509 128,256 24 0.0476
128,256 64 0.0461 128,256 128 0.0412
128,512 24 0.0497 128,512 64 0.0505
128,512 96 0.0508 128,512 128 0.0501
256,512 64 0.0525 256,512 96 0.0455
256,512 144 0.0443 256,512 256 0.0461
256,1028 64 0.0509 256,1028 96 0.0455
256,1028 144 0.0482 256,1028 256 0.0465
512,1028 12 0.0491 512,1028 96 0.0461
512,1028 144 0.0483 512,1028 256 0.0480
512,1028 288 0.0447 512,1028 314 0.0468
512,1028 440 0.0453 512,1028 512 0.0459
Table 2. Empirical size under uniform distribution U(-1,1) error assumption
f f
2,32 1 0.1363 0.2550 0.5409 8,32 2 0.1056 0.1705 0.3224
8,32 4 0.0906 0.1724 0.3841 8,32 8 0.1093 0.1831 0.3597
16,64 4 0.1888 0.3672 0.6783 16,64 12 0.1987 0.3764 0.7199
32,64 12 0.1055 0.1584 0.3542 32,64 24 0.1030 0.1739 0.3791
32,128 12 0.3673 0.6637 0.9552 32,128 24 0.3706 0.6655 0.9556
64,128 12 0.1754 0.3335 0.6345 64,128 36 0.1897 0.3519 0.6639
128,256 12 0.3255 0.6104 0.9160 128,256 24 0.3324 0.6037 0.9225
128,256 64 0.3362 0.6200 0.9345 128,256 128 0.3362 0.6515 0.9438
128,512 24 0.9064 0.9981 1.0000 128,512 64 0.9151 0.9976 1.0000
128,512 96 0.9167 0.9981 1.0000 128,512 128 0.9196 0.9981 1.0000
256,512 64 0.5880 0.8951 0.9975 256,512 96 0.6041 0.9029 0.9980
256,512 144 0.6019 0.8963 0.9990 256,512 256 0.6117 0.9103 0.9987
256,1028 64 0.9970 1.0000 1.0000 256,1028 96 0.9973 1.0000 1.0000
256,1028 144 0.9971 1.0000 1.0000 256,1028 256 0.9976 1.0000 1.0000
512,1028 12 0.8766 0.9957 1.0000 512,1028 96 0.8829 0.9958 1.0000
512,1028 144 0.9201 0.9979 1.0000 512,1028 256 0.8967 0.9954 1.0000
512,1028 288 0.9125 0.9986 1.0000 512,1028 314 0.8946 0.9969 1.0000
512,1028 440 0.8942 0.9975 1.0000 512,1028 512 0.8937 0.9979 1.0000
Table 3. Empirical power under Gaussian error assumption
f f
2,32 1 0.1457 0.2521 0.5548 8,32 2 0.1245 0.1721 0.3478
8,32 4 0.1245 0.1754 0.3548 8,32 8 0.1254 0.1845 0.3547
16,64 4 0.1987 0.3789 0.6567 16,64 12 0.1879 0.3478 0.7456
32,64 12 0.1145 0.1544 0.3582 32,64 24 0.1125 0.1555 0.3548
32,128 12 0.3825 0.6647 0.9845 32,128 24 0.3845 0.6789 0.9677
64,128 12 0.1863 0.3765 0.6748 64,128 36 0.1758 0.3877 0.6478
128,256 12 0.3358 0.5978 0.9185 128,256 24 0.3495 0.6657 0.9244
128,256 64 0.3378 0.5899 0.9578 128,256 128 0.3392 0.6788 0.9584
128,512 24 0.9114 0.9945 1.0000 128,512 64 0.9121 0.9944 1.0000
128,512 96 0.9102 0.9977 1.0000 128,512 128 0.9157 0.9945 0.9999
256,512 64 0.6053 0.8979 0.9969 256,512 96 0.6020 0.9456 0.9978
256,512 144 0.6151 0.8966 1.0000 256,512 256 0.6135 0.9678 1.0000
256,1028 64 0.9975 1.0000 1.0000 256,1028 96 0.9972 1.0000 1.0000
256,1028 144 0.9921 1.0000 1.0000 256,1028 256 0.9982 1.0000 1.0000
512,1028 12 0.8787 0.9944 1.0000 512,1028 96 0.8800 0.9976 1.0000
512,1028 144 0.9201 0.9913 1.0000 512,1028 256 0.8881 0.9964 1.0000
512,1028 288 0.9165 0.9959 1.0000 512,1028 314 0.8957 0.9967 1.0000
512,1028 440 0.8978 0.9944 1.0000 512,1028 512 0.8959 0.9947 1.0000
Table 4. Empirical power under uniform distribution U(-1,1) error assumption

3.2. Performance of the Box-Pierce type test

This subsection investigates the performance of our proposed Box-Pierce type test statistic in subsection 2.3. The design matrix is obtained in the same way as in the last subsection, with

, and the random error terms are assumed to obey a (1) normal distribution N(0,1) and a (2) gamma distribution with parameters 4 and 1/2. Table

5 and Table 6 show the empirical size of our test with different under the two error distributions. We consider the following AR(2) model to assess the power:

where are independently drawn from (1) N(0,1) and (2) Gamma(4,1/2). The design matrix is obtained in the same way as before, with . Tables 7 and 8 show the empirical power of our proposed test for different under the two error distributions.

As shown by these simulation results, the empirical size and empirical power of the portmanteau test improve as tends to infinity.

2,32 30 0.0389 0.0402 8,32 24 0.0351 0.0350
16,32 16 0.0299 0.0349 24,32 8 0.0208 0.0132
2,64 62 0.0443 0.0505 32,64 32 0.0391 0.0420
32,128 96 0.0436 0.0501 64,128 64 0.0402 0.0427
32,256 224 0.0489 0.0470 64,256 192 0.0475 0.0485
128,256 128 0.0452 0.0477 16,512 496 0.0499 0.0494
64,512 448 0.0490 0.0486 128,512 384 0.0502 0.0513
256,512 256 0.0473 0.0438 64,1028 964 0.0461 0.0494
128,1028 900 0.0480 0.0485 256,1028 772 0.0492 0.0501
Table 5. Empirical size under Gaussian error assumption
2,32 30 0.0359 0.0374 8,32 24 0.0390 0.0383
16,32 16 0.0265 0.0281 24,32 8 0.0129 0.0087
2,64 62 0.0444 0.0426 32,64 32 0.0385 0.0365
32,128 96 0.0430 0.0448 64,128 64 0.0439 0.0417
32,256 224 0.0497 0.0437 64,256 192 0.0509 0.0514
128,256 128 0.0487 0.0465 16,512 496 0.0504 0.0498
64,512 448 0.0479 0.0511 128,512 384 0.0498 0.0458
256,512 256 0.0518 0.0523 64,1028 964 0.0500 0.0489
128,1028 900 0.0490 0.0513 256,1028 772 0.0439 0.0503
Table 6. Empirical size under Gamma(4,1/2) error assumption
2,32 30 0.2630 0.1960 8,32 24 0.1699 0.1265
16,32 16 0.0890 0.0694 24,32 8 0.0760 0.0205
2,64 62 0.5698 0.4064 32,64 32 0.1708 0.1210
32,128 96 0.6660 0.4775 64,128 64 0.2764 0.2232
32,256 224 0.9849 0.9278 64,256 192 0.9369 0.8167
128,256 128 0.6147 0.4335 16,512 496 1.0000 1.0000
64,512 448 1.0000 1.0000 128,512 384 0.9991 0.9897
256,512 256 0.9155 0.7551 64,1028 964 1.0000 1.0000
128,1028 900 1.0000 1.0000 256,1028 772 1.0000 1.0000
Table 7. Empirical power under Gaussian error assumption
2,32 30 0.2657 0.1892 8,32 24 0.1202 0.1822
16,32 16 0.0519 0.0281 24,32 8 0.0202 0.0198
2,64 62 0.5721 0.3981 32,64 32 0.1190 0.1998
32,128 96 0.6738 0.5285 64,128 64 0.2853 0.1757
32,256 224 0.9291 0.8898 64,256 192 0.9034 0.7370
128,256 128 0.6320 0.4225 16,512 496 1.0000 0.9998
64,512 448 1.0000 0.9989 128,512 384 0.9989 0.9893
256,512 256 0.9137 0.7530 64,1028 964 1.0000 1.0000
128,1028 900 1.0000 1.0000 256,1028 772 1.0000 1.0000
Table 8. Empirical power under Gamma(4,1/2) error assumption

3.3. Parameter estimation under the null hypothesis

In practice, if the error terms are not Gaussian, we need to estimate the fourth-order cumulant to perform the test. We now give a suggested estimate under the additional assumption that the error terms are independent under the null hypothesis. Note that an unbiased estimate of variance under the null hypothesis is

and

Then, can be estimated by a consistent estimator

4. A general joint CLT for several general quadratic forms

In this section, we establish a general joint CLT for several general quadratic forms, which helps us to find the asymptotic distribution of the statistics for testing the series correlations. We believe that the result presented below may have its own interest.

4.1. A brief review of random quadratic forms

Quadratic forms play an important role not only in mathematical statistics but also in many other branches of mathematics, such as number theory, differential geometry, linear algebra and differential topology. Suppose , where is a sample of size drawn from a certain standardized population. Let be a matrix. Then, is called a random quadratic form in . The random quadratic forms of normal variables, especially when is symmetric, have been considered by many authors, who have achieved fruitful results. We refer the reader to [Bartlett et al.(1960)Bartlett, Gower, and Leslie, Darroch(1961), Gart(1970), Hsu et al.(1999)Hsu, Prentice, Zhao, and Fan, Forchini(2002), Dik and De Gunst(2010), Al-Naffouri et al.(2016)Al-Naffouri, Moinuddin, Ajeeb, Hassibi, and Moustakas]. Furthermore, many authors have considered the more general situation, where follow a non-Gaussian distribution. For the properties of those types of random quadratic forms, we refer the reader to [Fox and Taqqu(1985), Cambanis et al.(1985)Cambanis, Rosinski, and Woyczynski, de Jong(1987), Gregory and Hughes(1995), Gotze and Tikhomirov(1999), Liu et al.(2009)Liu, Tang, and Zhang, Deya and Nourdin(2014), Oliveira(2016)] and the references therein.

However, few studies have considered the joint distribution of several quadratic forms. Thus, in this paper, we want to establish a general joint CLT for several random quadratic forms with general distributions.

4.2. Assumptions and results

To this end, suppose

is a random matrix. Let be nonrandom -dimensional matrices. Define for We are interested in the asymptotic distribution, as , of the random vector , which consists of random quadratic forms. Now, we make the following assumptions.

  • are standard random variables (mean zero and variance one) with uniformly bounded fourth-order moments .

  • The columns of are independent.

  • The spectral norms of the square matrices are uniformly bounded in .

Clearly, for , we have , and for , we obtain

(4.1)

Let ; then, we have

Thus, according to assumptions , for any at most has the same order as . This result also holds for any by applying the Cauchy-Schwartz inequality. We then have the following theorem.

Theorem 4.1.

In addition to assumptions (a)-(c), suppose that there exists an such that has the same order as when . Then, the distribution of the random vector is asymptotically -dimensional normal.

4.3. Proof of Theorem 4.1

We are now in position to present the proof of the joint CLT via the method of moments. The procedure of the proof is similar to that in [Bai et al.(2018)Bai, Pan, and Yin] but is more complex since we need to establish the CLT for a -dimensional, rather than 2-dimensional, random vector. Moreover, we do not assume the underlying distribution to be symmetric and identically distributed. The proof is separated into three steps.

4.3.1. Step 1: Truncation

Noting that , , for any , we have Thus, we may select a sequence such that . The convergence rate of to 0 can be made arbitrarily slow. Define to be the analogue of with replaced by , where . Then,

Therefore, we need only to investigate the limiting distribution of the vector .

4.3.2. Step 2: Centralization and Rescaling

Define to be the analogue of with replaced by . Denote by the distance between two random variables and . Additionally, denote , and We obtain that for any

(4.2)

Noting that ’s are independent random variables with 0 means and unit variances, it follows that

Since and

we know that

(4.3)

Then, we have

It follows that and By combining the above estimates, we obtain that for

Noting that the entries in the covariance matrix of the random vector have at most the same order as , we conclude that