# Joint CLT for eigenvalue statistics from several dependent large dimensional sample covariance matrices with application

Let X_n=(x_ij) be a k × n data matrix with complex-valued, independent and standardized entries satisfying a Lindeberg-type moment condition. We consider simultaneously R sample covariance matrices B_nr=1/nQ_r X_n X_n^*Q_r^, 1< r< R, where the Q_r's are nonrandom real matrices with common dimensions p× k (k≥ p). Assuming that both the dimension p and the sample size n grow to infinity, the limiting distributions of the eigenvalues of the matrices {B_nr} are identified, and as the main result of the paper, we establish a joint central limit theorem for linear spectral statistics of the R matrices {B_nr}. Next, this new CLT is applied to the problem of testing a high dimensional white noise in time series modelling. In experiments the derived test has a controlled size and is significantly faster than the classical permutation test, though it does have lower power. This application highlights the necessity of such joint CLT in the presence of several dependent sample covariance matrices. In contrast, all the existing works on CLT for linear spectral statistics of large sample covariance matrices deal with a single sample covariance matrix (R=1).

## Authors

• 6 publications
• 9 publications
• 14 publications
09/23/2020

### Asymptotic independence of spiked eigenvalues and linear spectral statistics for large sample covariance matrices

We consider general high-dimensional spiked sample covariance models and...
01/23/2019

### Central limit theorem for linear spectral statistics of general separable sample covariance matrices with applications

In this paper, we consider the separable covariance model, which plays a...
07/21/2021

### Linear spectral statistics of sequential sample covariance matrices

Independent p-dimensional vectors with independent complex or real value...
12/13/2019

### Central Limit Theorem for Linear Spectral Statistics of Large Dimensional Kendall's Rank Correlation Matrices and its Applications

This paper is concerned with the limiting spectral behaviors of large di...
09/24/2018

### Moment bounds for autocovariance matrices under dependence

The goal of this paper is to obtain expectation bounds for the deviation...
03/12/2020

### Existence and Uniqueness of the Kronecker Covariance MLE

In matrix-valued datasets the sampled matrices often exhibit correlation...
01/10/2022

### Spiked eigenvalues of high-dimensional sample autocovariance matrices: CLT and applications

High-dimensional autocovariance matrices play an important role in dimen...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Modern information technology tremendously accelerates computing speed and greatly enlarges the amount of data storage, which enables us to collect, store and analyze data of large dimensions. Classical limit theorems in multivariate analysis, which normally assume fixed dimensions, become no longer applicable for dealing with high dimensional problems. Random matrix theory investigates the spectral properties of random matrices when their dimensions tend to infinity and hence provides a powerful framework for solving high dimensional problems. This theory has made systematic corrections to many classical multivariate statistical procedures in the past decades, see the monographs of

Bai and Silverstein (2010), Yao et al. (2015) and the review papers Johnstone (2007) and Paul and Aue (2014). It has found diverse applications in various research areas, including signal processing, network security, image processing, statistical genetics and other financial econometrics problems.

The sample covariance matrix is of central importance in multivariate analysis. Many fundamental statistics in multivariate analysis can be written as functionals of eigenvalues of a sample covariance matrix such as linear spectral statistics (LSSs) of the form where the ’s are eigenvalues of and

is a smooth function. The wide range of creditable applications in high dimensional statistics triggered an uptick in the demand for CLTs of such LSSs. Actually one of the most widely used results in this area is

Bai and Silverstein (2004), which considers a sample covariance matrix of the form , where is matrix consisting of i.i.d. complex standardized entries and is a nonnegative Hermitian matrix. A CLT for LSSs of is established under the so-called Marenko-Pastur regime, i.e. . Further refinement and extensions can be found in Zheng et al. (2015), Chen and Pan (2015), Zheng et al. (2017a), and Zheng et al. (2017b). Among them, Zheng et al. (2015)

studied the unbiased sample covariance matrix when the population mean vector is unknown.

Chen and Pan (2015) looked into the ultra-high dimensional case when the dimension is much larger than the sample size , that is as . Zheng et al. (2017a) derived the CLT for LSSs of large dimensional general Fisher matrices. Zheng et al. (2017b) attempted to relax the fourth order moment condition in Bai and Silverstein (2004) and incorporated it into the limiting parameters.

However, this rich literature all deals with a single sample covariance matrix . This paper, on the contrary, aims at the joint limiting behaviour of functionals of several groups of eigenvalues coming from different yet closely related sample covariance matrices. Specifically, we consider data samples of the form where

1. is a sequence of -dimensional independent and complex-valued random vectors with independent standardized components , i.e. and , and the dimension ;

2. are nonrandom real matrices with common dimensions . The population covariance matrices are assumed product-commutative.

We thus consider sample covariance matrices given by

 Bnr=1nn∑j=1yjry∗jr=1nQrXnX∗nQ⊤r , 1≤r≤R, (1.1)

where is of size , denotes the conjugate transpose of matrices or vectors, and stands for the transpose of real ones. Let be the eigenvalues of , and consider real-valued functions . This leads to the family of LSSs

 φlr=flr(λ1r)+⋯+flr(λpr), 1≤l≤L, 1≤r≤R .

This paper establishes a joint CLT for these statistics under appropriate conditions.

The importance of such joint CLT for LSSs is best explained and illustrated with the following problem of testing a high dimensional white noise. Indeed, our motivation for the joint CLT originates from this application to high-dimensional time series analysis. Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modelling. For high dimensional time series, current literatures focus on estimation and dimension-reduction aspects of the modelling, including high dimensional VAR models and various factor models. Yet model diagnostics have largely been untouched. Classical omnibus tests such as the multivariate Hosking and Li-McLeod tests are no longer suitable for handling high dimensional time series. They become extremely conservative, losing size and power dramatically. In a very recent work,

Li et al. (2016)

looked into this high dimensional portmanteau test problem and proposed several new test statistics based on single-lagged and multi-lagged sample auto-covariance matrices. More precisely, let’s consider a

-dimensional time series modelled as a linear process

 xt=∑l≥0Alzt−l, (1.2)

where is a sequence of independent -dimensional random vectors with independent components satisfying . Hence has , and its lag- auto-covariance matrix depends on only. In particular, denotes the population covariance matrix of the series. The goal is to test whether is a white noise, i.e.

 H0:  Cov(xt+τ,xt)=0, τ=1,⋯,q, (1.3)

where is a prescribed constant integer. Let be a sample generated from the model (1.2). The lag- sample auto-covariance matrix is

 ˆΣτ=1nn∑t=1xtx∗t−τ, (1.4)

where when . Li et al. (2016) proposed a test statistic based on . For any given constant integer , their test statistic was designed to test the specific lag- autocorrelation of the sequence, i.e.

 ˜Lτ=p∑j=1λ2j,τ=Tr(˜M∗τ˜Mτ),

where are the eigenvalues of

 ˜Mτ=12(ˆΣτ+ˆΣ∗τ)=12nn∑t=1(xtx∗t−τ+xt−τx∗t),

which is the symmetrized lag- sample auto-covariance matrix.

Notice that in matrix form , where

 Dτ=(0In−τIτ0)

where denotes the

th order unit matrix. They have proved that, under the null hypothesis, in the simplest setting when

, the limiting distribution of the test statistic is

 np˜Lτ−p2d→N(12, 1+3c(ν4−1)2).

Here, and and . The null hypothesis should be rejected for large values of . Simulation results also show that this test statistic is consistently more powerful than the Hosking and Li-McLeod tests even when the latter two have been size adjusted.

It can be seen that the test statistic is an LSS of , which can be studied with the CLT in Bai and Silverstein (2004). Indeed, the non-null eigenvalues of the sample covariance matrix considered there are the same as the matrix which resembles to the matrix . However, the test statistic can only detect serial dependence in a single lag each time. In order to capture a multi-lag dependence structure, a naturally more effective way would be accumulating the lags and consider the statistic

 (1.5)

Note that the CLT in Bai and Silverstein (2004) (or in its recent extensions) can only be used to study the correlations between different LSSs of a given , while to derive the null distribution of , we need to study the correlations between LSSs of several covariance matrices . Consequently, we need to resort to the joint CLT studied in this paper to characterize the correlations among . It is worth noticing that Li et al. (2016) proposed another multi-lagged test statistic by stacking a number of consecutive observation vectors. It will be shown in this paper that this test statistic is essentially much less powerful than considered here due to the data loss caused by observation stacking. This superiority of over demonstrates the necessity and significance of studying a joint CLT for LSSs of several dependent sample covariance matrices as proposed in this paper.

The rest of the paper is organized as follows. The main results of the joint CLT of LSSs of different sample covariance matrices are presented in Section 2. The application on high dimensional white noise test is provided in Section 3 to demonstrate the utility of this joint CLT. Numerical studies have also lent full support to the theoretical derivations. Technical lemmas and proofs are left to Section 4. Finally, Matlab codes for reproducing simulations in the paper are available at:   http://web.hku.hk/~jeffyao/papersInfo.html.

## 2 Joint CLT for linear spectral statistics of {Bnr}1≤r≤R

### 2.1 Preliminary knowledge on LSDs of {Bnr}1≤r≤R

Recall that the empirical spectral distribution (ESD) of a square matrix

is the probability measure

, where the ’s are eigenvalues of and denotes the Dirac mass at point . For any probability measure on the real line, its Stieltjes transform is defined by

 m(z)=∫1x−zdF(x),z∈C+,

where denotes the upper complex plane.

The assumptions needed for the existence of limiting spectral distributions (LSDs) of are as follows. The setup as well as the following Lemma 2.1 are established in Zheng et al. (2017b).

Assumption (a)

Both dimensions and tend to infinity such that as .

Assumption (b)

Samples are , where is , is , and the dimension () is arbitrary. Moreover, is a

array of independent random variables, not necessarily identically distributed, with common moments

 Exij=0,  E|x2ij|=1,

and satisfying the following Lindeberg-type condition: for each ,

 1pnη2k∑i=1n∑j=1R∑r=1∥qir∥2E|x2ij|I(|xij|>η√n/∥qir∥)→0,

where is the Euclidean norm of the -th column vector of .

Assumption (c)

The ESD of the population covariance matrix

converges weakly to a probability distribution

, . Also the sequence of the spectral norm of is bounded in and .

###### Lemma 2.1.

[Theorem 2.1 of Zheng et al. (2017b)] Under Assumptions (a)-(c), almost surely, the ESD of weakly converges to a nonrandom LSD . Moreover, the Stieltjes transform of is the unique solution to the following Marčenko-Pastur equation

 mr(z)=∫1t[1−c−czmr(z)]−zdHr(t) , (2.1)

on the set .

Define the companion LSD of as

 F––c,Hr=(1−c)δ0+cFc,Hr.

It is readily checked that is the LSD of the companion sample covariance matrix (which is ), and its Stieltjes transform satisfies the so-called Silverstein equation

 z=−1m––r(z)+c∫t1+tm––r(z)dHr(t). (2.2)

### 2.2 Main Results

Let and be two real symmetric matrices satisfying . The two matrices can then be diagonalized simultaneously. We define the joint spectral distribution of as the two-dimensional spectral distribution of the complex matrix , i.e.,

 G(x,y)=1p#{i≤p,R(si)≤x, I(si)≤y},

where are the eigenvalues of and denotes the cardinality of a set .

Recall the random vector of LSSs of ’s

 (∫fℓr(x)dFnr(x))1≤ℓ≤L,1≤r≤R, (2.3)

where are the corresponding empirical spectral distributions of and are

measurable functions on the real line. Our aim in this section is to establish the joint distribution of (

2.3) under suitable conditions. The main results are presented as follows.

Assumption (d)

The variables are independent, with common moments

 Exij=0,  E|x2ij|=1,  βx=E|x4ij|−|Ex2ij|2−2,  and  αx=|Ex2ij|2,

and satisfying the following Lindeberg-type condition: for each

 1pnη6k∑i=1n∑j=1R∑r=1∥qir∥2E|x4ij|I(|xij|>η√n/∥qir∥)→0. (2.4)
Assumption (e)

Either , or the mixing matrices are such that the matrices are diagonal (therefore with arbitrary ).

Assumption (f)

The joint spectral distribution of and converges weakly to a probability distribution , .

The framework with Assumptions (d)-(e)-(f) is inspired by the one advocated in Zheng et al. (2017b). However, an extension is necessary here since we are dealing with several random matrices simultaneously while only one matrix is considered in the reference.

###### Theorem 2.1.

Under Assumptions (a)-(f), let be functions analytic on a complex domain containing

 [I(0

with , and and denoting the smallest and the largest eigenvalue of all the matrices in , respectively. Then, the random vector

 p(∫fℓr(x)dFnr(x)−∫fℓr(x)dFcn,Hnr(x))1≤ℓ≤L,1≤r≤R. (2.6)

converges to an -dimensional Gaussian random vector . The mean function is

 EXfℓr = −12πi∮C1fℓr(z)g1(z)[αx(1−g2(z))(1−αxg2(z))+βx1−g2(z)]dz,

where

 g1(z)=∫cm––3r(z)t2(1+tm––r(z))3dHr(t)andg2(z)=∫cm––2r(z)t2(1+tm––r(z))2dHr(t).

The covariance function is

 Cov(Xfℓ′r,Xfℓs) = 14π2∮C1∮C2fℓ′(z1)fℓ(z2)∂2g(z1,z2)∂z1∂z2dz1dz2, (2.7)

where with

 a(z1,z2)=∬cm––r(z1)m––s(z2)t1t2(1+t1m––r(z1))(1+t2m––s(z2))dHrs(t1,t2).

The contours and are non-overlapping, closed, positively orientated in the complex plane, and enclosing both the supports of and of .

###### Remark 1.

The centralization term in (2.6) is the expectation of with respect to the distribution . This distribution is a finite dimensional version of the LSD , which is defined by (2.1) with the parameters replaced with . The use of instead of aims to eliminate the effect of the convergence rate of to .

As an illustrative example of Theorem 2.1, we consider a simplified case where only two sample covariance matrices are involved, i.e. and , where is a matrix of i.i.d. real standard Gaussian variables. The corresponding population covariance matrices are and

, respectively. It’s clear that the ESD and its limit of the identity matrix

are both the Dirac measure . Those of are general and denoted by and , respectively. Moreover, the joint spectral distribution function of and is equal to for and zero otherwise. Denote the ESDs of the two sample covariance matrices by and , respectively, and let

 Gn1(x)=Fn1(x)−Fcn,δ1(x)andGn2(x)=Fn2(x)−Fcn,Hn2(x).

Then for any analytic function , we have

 p(∫f(x)dGn1(x),∫f(x)dGn2(x))⊤d→N((v1v2),(ψ11ψ12ψ12ψ22)). (2.8)

The parameters of the marginal distributions in (2.8) have been derived by many authors, see Bai and Silverstein (2004) and Zheng et al. (2017b) for example. While the covariance parameter is new and, from Theorem 2.1, it can be formulated as

 v12 =

where and are the companion Stieltjes transforms of the LSDs and , respectively, and denotes the derivative of with respect to . For the simplest function , one may figure out by the residual theorem.

## 3 Application to high dimensional white noise test

As discussed in the introduction, a notable application of the joint CLT presented in this paper is to the high dimensional white noise test. In particular, it is expected that testing power could be gained by accumulating information across different lags, that is, by using the test statistic defined in (1.5).

Define the scaled statistic

 ϕq=npLq−qp2. (3.1)

The null hypothesis will be rejected for large values of . We consider high-dimensional situations where the dimension is large compared to the sample size . By applying the CLT in Theorem 2.1, the asymptotic null distribution of is derived as follows.

###### Theorem 3.1.

Let be a fixed integer, and assume that

1. is a set of i.i.d. real-valued variables satisfying ;

2. Relaxed Marčenko-Pastur regime: both the sample size and the dimension grow to infinity such that

 0

Then in the simplest setting where , we have

 s(cn)−1/2{ϕq−q2}d→N(0,1), (3.2)

where

The proof of this theorem is given in Section 4.

Let be the upper-quantile of the standard normal distribution at level . Based on Theorem 3.1, we obtain a procedure for testing the null hypothesis in (1.3) as follows.

 \em Multi-Lag-q test:\quad Reject H0 ~{}~{}if ~{}~{}ϕq−q2>Zα√s(cn). (3.3)

### 3.1 Simulation Experiments

Most of the experiments of this section are designed to compare our test procedure in (3.3) and the procedure based on the test statistic from Li et al. (2016) using Simes’ method (Simes, 1986). In Li et al. (2016), several testing procedures are discussed and the test performs quite satisfactorily in terms of both size and power across different scenarios.

More precisely, let be a fixed integer, define -dimensional vectors , . Since and , we have

 Cov(yj)=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝Σ0Σ1⋯ΣqΣ1Σ0⋱⋮⋮⋱⋱Σ1Σq⋯Σ1Σ0⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠(q+1)p×(q+1)p.

The null hypothesis becomes , a test for a block diagonal covariance structure of the stacked sequence .

When , the white noise test of reduces to a sphericity test of . The well known John’s test statistic can be adopted for this purpose. In our case, the corresponding John’s test statistic is defined as

 Uq=1p(q+1)∑p(q+1)i=1(li,q−¯lq)2¯lq2,

where are the eigenvalues of the sample covariance matrix and is their average.

Notice however that the use of blocks above reduces the sample size to the number of blocks . This may result in a certain loss of power for the test. To limit such loss of power, we adopt Simes’ method for multiple hypothesis testing in Simes (1986). To implement Simes’ method, we denote

 y(0)j=yj(x1,…,xn)

as the previously defined stacked sample. Then we rotate the sample and define a series of new stacked samples for , that is,

 y(k)j=yj(xk+1,…,xn,x1,…,xk)

Then John’s test statistic can be calculated based on the samples, which results in different statistics . Moreover, let , denote the (asymptotic) P-value for the John’s test with the -th set of ’s, i.e.

 Pk=1−Φ((NU(k)q−p(q+1)−ν4+2)/2),

where

is the cumulative distribution function of the standard normal distribution. Let

be a permutation of . Then by the Simes method, we reject if at least for one for the nominal level .

To compare our test statistic with multi-lag- John’s test statistic , we set the significance level and the critical regions of the two tests are

• Our test : ;

• Multi-lag- John’s test (using Simes’ method): .

Data are generated following four different scenarios for comparison:

• Test size under Gaussian white noise: , ;

• Test size under Non-Gaussian white noise: , , , , ;

• Test power under a Gaussian spherical AR(1) process: , , , ;

• Test power under a Non-Gaussian spherical AR(1) process: , , , , , Var, .

Various -combinations are tested to show the suitability of our test statistic for both low and high dimensional settings. Empirical statistics are obtained using 2000 independent replications. Table 1 compares the empirical sizes of the two tests and . It can be seen that both of them have reasonable sizes compared to the 5% nominal level across all the tested

-combinations. Still, the two tests become slightly conservative under Non-Gaussian distributions in Scenario (II) compared to the Gaussian case in Scenario (I). A sample display of these sizes is given in Figure

1 (left panel).

In Table 2, we compare the power of the two tests. Our test displays a generally much higher power than the multi-lag- John’s test , especially when the dimensions become larger. On the other hand, both tests have slightly lower power under the Non-Gaussian distribution than under the Gaussian distribution, which is consistent with the previous observation that the two tests become more conservative with Non-Gaussian populations. A sample display of these powers is given in Figure 1 (right panel).

To further explore the powers of the two tests, we varied the AR coefficient in Scenario (III) and (IV) from -0.1 to 0.1 ( corresponds to testing size). Smaller values of the AR(1) coefficient are used here leading to a more difficult testing problem and a generally decreased power for both tests. Three dimensional settings are considered with while the sample size is fixed as . The number of independent replications is still 2000 in each case. Results for Scenario (III) and (IV) are plotted in Figure 1. This Figure further consolidates that our test dominates under all tested scenarios. A nonnegligible increase in the testing power of both test statistics as the dimension becomes larger sheds more light on the blessings of high dimensionality. Still both tests are more conservative with Non-Gaussian population distribution than with Gaussian distribution.

### 3.2 Comparison to a permutation test

As many complex analytic tools are employed to derive the asymptotic null distributions of the test statistic , it is natural to wonder about the performance of a “simple-minded” test procedure, namely the permutation test. Under the null hypothesis of white noise, since the sample vectors have an i.i.d. structure, one can permute these sample vectors say times to obtain an empirical upper 5% quantiles of the test statistic . The null hypothesis will be rejected if the observed statistic from the original (non permuted) sample vectors is larger than this empirical quantile.

Data are generated following the spherical AR(1) process in Scenario (III) and (IV) to compare this straightforward test with our test statistic . In order to compare the power performance of two tests, the AR coefficient takes different values, ( corresponds to testing size). The sample size is fixed as yet data dimension varies. As for the permutation test, the permutation times is set as . The nominal level is . Testing size and power of two tests are shown in Tables 3 and 4 based on 500 replicates for all configurations.

It can be seen that the sizes of both tests are well controlled. As for their power, our test offers an acceptable level while the permutation test consistently performs better in the tested cases. However, the permutation test is extremely time consuming compared to our test. For instance, to run one set of combination for 500 replicates, it takes only 25 seconds with our test, while almost 3 hours for the permutation test with permutation times . Particularly the computation time increases greatly when the sample size grows. Therefore, our test statistic provides a very competitive choice for testing high dimensional white noise while the classical permutation test is simpler, more powerful though much slower.

## 4 Proofs of the main theorems

### 4.1 Proof of Theorem 2.1

The general strategy for our main Theorem 2.1 follows the methods advocated in Bai and Silverstein (2004), with its most recent update in Zheng et al. (2017b). However, as we are dealing with several random matrices simultaneously, all the technical steps for the implementation of this strategy have to be carefully rewritten. They are presented in this section.

#### 4.1.1 Sketch of the proof of Theorem 2.1

Let be arbitrary, be any number greater than the right end point of interval (2.5), and be any negative number if the left end point of (2.5) is zero, otherwise choose . Define a contour as

 C={x+iv:x∈{xr,xl},v∈[−v0,v0]}∪Cu with Cu={x±iv0:x∈[xl,xr]}, (4.1)

and let with for some . By definition, the contour encloses a rectangular region in the complex plane, which contains the union of the support sets of all the LSDs , . As a regularized version of , excludes a small segment near the real line.

Let , , , be the Stieltjes transforms of , , , and , respectively, where is the ESD of , is the LSD defined in Remark 1, and are linked by the equation . A major task of proving Theorem 2.1 is to study the convergence of the empirical process

 Mnr(z):=p[mnr(z)−m0nr(z)]=n[m––nr(z)−m––0nr(z)].

To this end, we need to truncate as

 ˆMnr(z)=⎧⎪⎨⎪⎩Mnr(z)z∈Cn,Mnr(x+in−1εn)x∈{xl,xr}andv∈[0,n−1εn],Mnr(x−in−1εn)x∈{xl,xr}andv∈[−n−1εn,0],

which agrees to on . This truncation is essential when proving the tightness on . Write

 ˆMn(z)=(ˆMn1(z),…,ˆMnR(z)),

we will establish its convergence as stated in the following lemma.

###### Lemma 4.1.

Under Assumptions (a)-(f), converges weakly to a Gaussian process on . The mean function is

 EMr(z)=αxg1(z)(1−g2(z))(1−αxg2(z))+βxg1(z)1−g2(z),

where

 g1(z)=∫cm––3r(z)t2(1+tm––