1 Introduction
Modern information technology tremendously accelerates computing speed and greatly enlarges the amount of data storage, which enables us to collect, store and analyze data of large dimensions. Classical limit theorems in multivariate analysis, which normally assume fixed dimensions, become no longer applicable for dealing with high dimensional problems. Random matrix theory investigates the spectral properties of random matrices when their dimensions tend to infinity and hence provides a powerful framework for solving high dimensional problems. This theory has made systematic corrections to many classical multivariate statistical procedures in the past decades, see the monographs of
Bai and Silverstein (2010), Yao et al. (2015) and the review papers Johnstone (2007) and Paul and Aue (2014). It has found diverse applications in various research areas, including signal processing, network security, image processing, statistical genetics and other financial econometrics problems.The sample covariance matrix is of central importance in multivariate analysis. Many fundamental statistics in multivariate analysis can be written as functionals of eigenvalues of a sample covariance matrix such as linear spectral statistics (LSSs) of the form where the ’s are eigenvalues of and
is a smooth function. The wide range of creditable applications in high dimensional statistics triggered an uptick in the demand for CLTs of such LSSs. Actually one of the most widely used results in this area is
Bai and Silverstein (2004), which considers a sample covariance matrix of the form , where is matrix consisting of i.i.d. complex standardized entries and is a nonnegative Hermitian matrix. A CLT for LSSs of is established under the socalled MarenkoPastur regime, i.e. . Further refinement and extensions can be found in Zheng et al. (2015), Chen and Pan (2015), Zheng et al. (2017a), and Zheng et al. (2017b). Among them, Zheng et al. (2015)studied the unbiased sample covariance matrix when the population mean vector is unknown.
Chen and Pan (2015) looked into the ultrahigh dimensional case when the dimension is much larger than the sample size , that is as . Zheng et al. (2017a) derived the CLT for LSSs of large dimensional general Fisher matrices. Zheng et al. (2017b) attempted to relax the fourth order moment condition in Bai and Silverstein (2004) and incorporated it into the limiting parameters.However, this rich literature all deals with a single sample covariance matrix . This paper, on the contrary, aims at the joint limiting behaviour of functionals of several groups of eigenvalues coming from different yet closely related sample covariance matrices. Specifically, we consider data samples of the form where

is a sequence of dimensional independent and complexvalued random vectors with independent standardized components , i.e. and , and the dimension ;

are nonrandom real matrices with common dimensions . The population covariance matrices are assumed productcommutative.
We thus consider sample covariance matrices given by
(1.1) 
where is of size , denotes the conjugate transpose of matrices or vectors, and stands for the transpose of real ones. Let be the eigenvalues of , and consider realvalued functions . This leads to the family of LSSs
This paper establishes a joint CLT for these statistics under appropriate conditions.
The importance of such joint CLT for LSSs is best explained and illustrated with the following problem of testing a high dimensional white noise. Indeed, our motivation for the joint CLT originates from this application to highdimensional time series analysis. Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modelling. For high dimensional time series, current literatures focus on estimation and dimensionreduction aspects of the modelling, including high dimensional VAR models and various factor models. Yet model diagnostics have largely been untouched. Classical omnibus tests such as the multivariate Hosking and LiMcLeod tests are no longer suitable for handling high dimensional time series. They become extremely conservative, losing size and power dramatically. In a very recent work,
Li et al. (2016)looked into this high dimensional portmanteau test problem and proposed several new test statistics based on singlelagged and multilagged sample autocovariance matrices. More precisely, let’s consider a
dimensional time series modelled as a linear process(1.2) 
where is a sequence of independent dimensional random vectors with independent components satisfying . Hence has , and its lag autocovariance matrix depends on only. In particular, denotes the population covariance matrix of the series. The goal is to test whether is a white noise, i.e.
(1.3) 
where is a prescribed constant integer. Let be a sample generated from the model (1.2). The lag sample autocovariance matrix is
(1.4) 
where when . Li et al. (2016) proposed a test statistic based on . For any given constant integer , their test statistic was designed to test the specific lag autocorrelation of the sequence, i.e.
where are the eigenvalues of
which is the symmetrized lag sample autocovariance matrix.
Notice that in matrix form , where
where denotes the
th order unit matrix. They have proved that, under the null hypothesis, in the simplest setting when
, the limiting distribution of the test statistic isHere, and and . The null hypothesis should be rejected for large values of . Simulation results also show that this test statistic is consistently more powerful than the Hosking and LiMcLeod tests even when the latter two have been size adjusted.
It can be seen that the test statistic is an LSS of , which can be studied with the CLT in Bai and Silverstein (2004). Indeed, the nonnull eigenvalues of the sample covariance matrix considered there are the same as the matrix which resembles to the matrix . However, the test statistic can only detect serial dependence in a single lag each time. In order to capture a multilag dependence structure, a naturally more effective way would be accumulating the lags and consider the statistic
(1.5) 
Note that the CLT in Bai and Silverstein (2004) (or in its recent extensions) can only be used to study the correlations between different LSSs of a given , while to derive the null distribution of , we need to study the correlations between LSSs of several covariance matrices . Consequently, we need to resort to the joint CLT studied in this paper to characterize the correlations among . It is worth noticing that Li et al. (2016) proposed another multilagged test statistic by stacking a number of consecutive observation vectors. It will be shown in this paper that this test statistic is essentially much less powerful than considered here due to the data loss caused by observation stacking. This superiority of over demonstrates the necessity and significance of studying a joint CLT for LSSs of several dependent sample covariance matrices as proposed in this paper.
The rest of the paper is organized as follows. The main results of the joint CLT of LSSs of different sample covariance matrices are presented in Section 2. The application on high dimensional white noise test is provided in Section 3 to demonstrate the utility of this joint CLT. Numerical studies have also lent full support to the theoretical derivations. Technical lemmas and proofs are left to Section 4. Finally, Matlab codes for reproducing simulations in the paper are available at: http://web.hku.hk/~jeffyao/papersInfo.html.
2 Joint CLT for linear spectral statistics of
2.1 Preliminary knowledge on LSDs of
Recall that the empirical spectral distribution (ESD) of a square matrix
is the probability measure
, where the ’s are eigenvalues of and denotes the Dirac mass at point . For any probability measure on the real line, its Stieltjes transform is defined bywhere denotes the upper complex plane.
The assumptions needed for the existence of limiting spectral distributions (LSDs) of are as follows. The setup as well as the following Lemma 2.1 are established in Zheng et al. (2017b).
 Assumption (a)

Both dimensions and tend to infinity such that as .
 Assumption (b)

Samples are , where is , is , and the dimension () is arbitrary. Moreover, is a
array of independent random variables, not necessarily identically distributed, with common moments
and satisfying the following Lindebergtype condition: for each ,
where is the Euclidean norm of the th column vector of .
 Assumption (c)

The ESD of the population covariance matrix
converges weakly to a probability distribution
, . Also the sequence of the spectral norm of is bounded in and .
Lemma 2.1.
[Theorem 2.1 of Zheng et al. (2017b)] Under Assumptions (a)(c), almost surely, the ESD of weakly converges to a nonrandom LSD . Moreover, the Stieltjes transform of is the unique solution to the following MarčenkoPastur equation
(2.1) 
on the set .
Define the companion LSD of as
It is readily checked that is the LSD of the companion sample covariance matrix (which is ), and its Stieltjes transform satisfies the socalled Silverstein equation
(2.2) 
2.2 Main Results
Let and be two real symmetric matrices satisfying . The two matrices can then be diagonalized simultaneously. We define the joint spectral distribution of as the twodimensional spectral distribution of the complex matrix , i.e.,
where are the eigenvalues of and denotes the cardinality of a set .
Recall the random vector of LSSs of ’s
(2.3) 
where are the corresponding empirical spectral distributions of and are
measurable functions on the real line. Our aim in this section is to establish the joint distribution of (
2.3) under suitable conditions. The main results are presented as follows. Assumption (d)

The variables are independent, with common moments
and satisfying the following Lindebergtype condition: for each
(2.4)  Assumption (e)

Either , or the mixing matrices are such that the matrices are diagonal (therefore with arbitrary ).
 Assumption (f)

The joint spectral distribution of and converges weakly to a probability distribution , .
The framework with Assumptions (d)(e)(f) is inspired by the one advocated in Zheng et al. (2017b). However, an extension is necessary here since we are dealing with several random matrices simultaneously while only one matrix is considered in the reference.
Theorem 2.1.
Under Assumptions (a)(f), let be functions analytic on a complex domain containing
(2.5) 
with , and and denoting the smallest and the largest eigenvalue of all the matrices in , respectively. Then, the random vector
(2.6) 
converges to an dimensional Gaussian random vector . The mean function is
where
The covariance function is
(2.7) 
where with
The contours and are nonoverlapping, closed, positively orientated in the complex plane, and enclosing both the supports of and of .
Remark 1.
As an illustrative example of Theorem 2.1, we consider a simplified case where only two sample covariance matrices are involved, i.e. and , where is a matrix of i.i.d. real standard Gaussian variables. The corresponding population covariance matrices are and
, respectively. It’s clear that the ESD and its limit of the identity matrix
are both the Dirac measure . Those of are general and denoted by and , respectively. Moreover, the joint spectral distribution function of and is equal to for and zero otherwise. Denote the ESDs of the two sample covariance matrices by and , respectively, and letThen for any analytic function , we have
(2.8) 
The parameters of the marginal distributions in (2.8) have been derived by many authors, see Bai and Silverstein (2004) and Zheng et al. (2017b) for example. While the covariance parameter is new and, from Theorem 2.1, it can be formulated as
where and are the companion Stieltjes transforms of the LSDs and , respectively, and denotes the derivative of with respect to . For the simplest function , one may figure out by the residual theorem.
3 Application to high dimensional white noise test
As discussed in the introduction, a notable application of the joint CLT presented in this paper is to the high dimensional white noise test. In particular, it is expected that testing power could be gained by accumulating information across different lags, that is, by using the test statistic defined in (1.5).
Define the scaled statistic
(3.1) 
The null hypothesis will be rejected for large values of . We consider highdimensional situations where the dimension is large compared to the sample size . By applying the CLT in Theorem 2.1, the asymptotic null distribution of is derived as follows.
Theorem 3.1.
Let be a fixed integer, and assume that

is a set of i.i.d. realvalued variables satisfying ;

Relaxed MarčenkoPastur regime: both the sample size and the dimension grow to infinity such that
Then in the simplest setting where , we have
(3.2) 
where
The proof of this theorem is given in Section 4.
Let be the upperquantile of the standard normal distribution at level . Based on Theorem 3.1, we obtain a procedure for testing the null hypothesis in (1.3) as follows.
(3.3) 
3.1 Simulation Experiments
Most of the experiments of this section are designed to compare our test procedure in (3.3) and the procedure based on the test statistic from Li et al. (2016) using Simes’ method (Simes, 1986). In Li et al. (2016), several testing procedures are discussed and the test performs quite satisfactorily in terms of both size and power across different scenarios.
More precisely, let be a fixed integer, define dimensional vectors , . Since and , we have
The null hypothesis becomes , a test for a block diagonal covariance structure of the stacked sequence .
When , the white noise test of reduces to a sphericity test of . The well known John’s test statistic can be adopted for this purpose. In our case, the corresponding John’s test statistic is defined as
where are the eigenvalues of the sample covariance matrix and is their average.
Notice however that the use of blocks above reduces the sample size to the number of blocks . This may result in a certain loss of power for the test. To limit such loss of power, we adopt Simes’ method for multiple hypothesis testing in Simes (1986). To implement Simes’ method, we denote
as the previously defined stacked sample. Then we rotate the sample and define a series of new stacked samples for , that is,
Then John’s test statistic can be calculated based on the samples, which results in different statistics . Moreover, let , denote the (asymptotic) Pvalue for the John’s test with the th set of ’s, i.e.
where
is the cumulative distribution function of the standard normal distribution. Let
be a permutation of . Then by the Simes method, we reject if at least for one for the nominal level .To compare our test statistic with multilag John’s test statistic , we set the significance level and the critical regions of the two tests are

Our test : ;

Multilag John’s test (using Simes’ method): .
Data are generated following four different scenarios for comparison:

Test size under Gaussian white noise: , ;

Test size under NonGaussian white noise: , , , , ;

Test power under a Gaussian spherical AR(1) process: , , , ;

Test power under a NonGaussian spherical AR(1) process: , , , , , Var, .
Various combinations are tested to show the suitability of our test statistic for both low and high dimensional settings. Empirical statistics are obtained using 2000 independent replications. Table 1 compares the empirical sizes of the two tests and . It can be seen that both of them have reasonable sizes compared to the 5% nominal level across all the tested
combinations. Still, the two tests become slightly conservative under NonGaussian distributions in Scenario (II) compared to the Gaussian case in Scenario (I). A sample display of these sizes is given in Figure
1 (left panel).In Table 2, we compare the power of the two tests. Our test displays a generally much higher power than the multilag John’s test , especially when the dimensions become larger. On the other hand, both tests have slightly lower power under the NonGaussian distribution than under the Gaussian distribution, which is consistent with the previous observation that the two tests become more conservative with NonGaussian populations. A sample display of these powers is given in Figure 1 (right panel).
To further explore the powers of the two tests, we varied the AR coefficient in Scenario (III) and (IV) from 0.1 to 0.1 ( corresponds to testing size). Smaller values of the AR(1) coefficient are used here leading to a more difficult testing problem and a generally decreased power for both tests. Three dimensional settings are considered with while the sample size is fixed as . The number of independent replications is still 2000 in each case. Results for Scenario (III) and (IV) are plotted in Figure 1. This Figure further consolidates that our test dominates under all tested scenarios. A nonnegligible increase in the testing power of both test statistics as the dimension becomes larger sheds more light on the blessings of high dimensionality. Still both tests are more conservative with NonGaussian population distribution than with Gaussian distribution.
3.2 Comparison to a permutation test
As many complex analytic tools are employed to derive the asymptotic null distributions of the test statistic , it is natural to wonder about the performance of a “simpleminded” test procedure, namely the permutation test. Under the null hypothesis of white noise, since the sample vectors have an i.i.d. structure, one can permute these sample vectors say times to obtain an empirical upper 5% quantiles of the test statistic . The null hypothesis will be rejected if the observed statistic from the original (non permuted) sample vectors is larger than this empirical quantile.
Data are generated following the spherical AR(1) process in Scenario (III) and (IV) to compare this straightforward test with our test statistic . In order to compare the power performance of two tests, the AR coefficient takes different values, ( corresponds to testing size). The sample size is fixed as yet data dimension varies. As for the permutation test, the permutation times is set as . The nominal level is . Testing size and power of two tests are shown in Tables 3 and 4 based on 500 replicates for all configurations.
It can be seen that the sizes of both tests are well controlled. As for their power, our test offers an acceptable level while the permutation test consistently performs better in the tested cases. However, the permutation test is extremely time consuming compared to our test. For instance, to run one set of combination for 500 replicates, it takes only 25 seconds with our test, while almost 3 hours for the permutation test with permutation times . Particularly the computation time increases greatly when the sample size grows. Therefore, our test statistic provides a very competitive choice for testing high dimensional white noise while the classical permutation test is simpler, more powerful though much slower.
4 Proofs of the main theorems
4.1 Proof of Theorem 2.1
The general strategy for our main Theorem 2.1 follows the methods advocated in Bai and Silverstein (2004), with its most recent update in Zheng et al. (2017b). However, as we are dealing with several random matrices simultaneously, all the technical steps for the implementation of this strategy have to be carefully rewritten. They are presented in this section.
4.1.1 Sketch of the proof of Theorem 2.1
Let be arbitrary, be any number greater than the right end point of interval (2.5), and be any negative number if the left end point of (2.5) is zero, otherwise choose . Define a contour as
(4.1) 
and let with for some . By definition, the contour encloses a rectangular region in the complex plane, which contains the union of the support sets of all the LSDs , . As a regularized version of , excludes a small segment near the real line.
Let , , , be the Stieltjes transforms of , , , and , respectively, where is the ESD of , is the LSD defined in Remark 1, and are linked by the equation . A major task of proving Theorem 2.1 is to study the convergence of the empirical process
To this end, we need to truncate as
which agrees to on . This truncation is essential when proving the tightness on . Write
we will establish its convergence as stated in the following lemma.
Lemma 4.1.
Under Assumptions (a)(f), converges weakly to a Gaussian process on . The mean function is
where
Comments
There are no comments yet.