High-dimensional covariance matrices in elliptical distributions with application to spherical test

This paper discusses fluctuations of linear spectral statistics of high-dimensional sample covariance matrices when the underlying population follows an elliptical distribution. Such population often possesses high order correlations among their coordinates, which have great impact on the asymptotic behaviors of linear spectral statistics. Taking such kind of dependency into consideration, we establish a new central limit theorem for the linear spectral statistics in this paper for a class of elliptical populations. This general theoretical result has wide applications and, as an example, it is then applied to test the sphericity of elliptical populations.



There are no comments yet.


page 1

page 2

page 3

page 4


Asymptotic independence of spiked eigenvalues and linear spectral statistics for large sample covariance matrices

We consider general high-dimensional spiked sample covariance models and...

Estimation of the covariance structure from SNP allele frequencies

We propose two new statistics, V and S, to disentangle the population hi...

Numerical techniques for the computation of sample spectral distributions of population mixtures

This note describes some techniques developed for the computation of the...

Central Limit Theorem for Linear Spectral Statistics of Large Dimensional Kendall's Rank Correlation Matrices and its Applications

This paper is concerned with the limiting spectral behaviors of large di...

Wald Statistics in high-dimensional PCA

In this note we consider PCA for Gaussian observations X_1,..., X_n with...

Modified Pillai's trace statistics for two high-dimensional sample covariance matrices

The goal of this study was to test the equality of two covariance matric...

Separating populations with wide data: A spectral analysis

In this paper, we consider the problem of partitioning a small data samp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Large-scale statistical inference develops rapidly in the last two decades. This type of inference often relies on spectral statistics of certain random matrices in high-dimensional frameworks, where both the dimension of the observations and the sample size tend to infinity. Recall that a linear spectral statistic (LSS) (Bai and Silverstein, 2010) of a

Hermitian random matrix

is of the form


where are the eigenvalues of , is a function defined on , and is called the spectral distribution (SD) of . Here denotes the Dirac measure at the point . In Ledoit and Wolf (2002) and Schott (2007)

, most test statistics are actually LSSs of sample covariance matrices.

Bai et al. (2009) made systematic corrections to several classical likelihood ratio tests to overcome the effect of high-dimension using LSSs of sample covariance matrices and F-matrices. Later, Bai et al. (2015)

derived the CLT for LSSs of a high-dimensional Beta matrix, which can be broadly used in multivariate statistical analysis, such as testing the equality of several covariance matrices, multivariate analysis of variance, and canonical correlation analysis, see

Anderson (2003). Most recently, based on an LSS of regularized canonical correlation matrices, Yang and Pan (2015)

proposed a test for the independence between two large random vectors.

Gao et al. (2017) applied LSSs of sample correlation matrices to the complete independence test for random variables and the equivalence test for factor loadings in a factor model. Clearly, it is of great interests to investigate the behaviors of LSSs under various circumstances.

Specifically, let be observations of , whose mean is zero and covariance matrix is . The sample covariance matrix is

Our attention in this paper is focused on the asymptotic properties of LSSs of . The earliest study on this problem dates back to Jonsson (1982), who obtained the central limit theorem (CLT) for LSSs of by assuming the population to be standard multivariate normal. A remarkable breakthrough was done in Bai and Silverstein (2004)

, where the population is allowed to be a linear transform of a vector of independent and identically distributed (i.i.d.) random variables, i.e.,


Here is a non-random transformation matrix with , and with i.i.d. ’s satisfying


The fourth moment condition was later extended by

Pan and Zhou (2008) to Though these assumptions are fairly weak, their requirement of linearly dependent structure in (1.2) still excludes a lot of important distributions. In particular, it excludes almost all distributions from the elliptical family (Fang and Zhang, 1990).

Elliptical distributions were originally introduced by Kelker (1970)

to generalize the multivariate normal distributions. A random vector

with zero mean follows an elliptical distribution if and only if it has a stochastic representation (Fang and Zhang, 1990):


where the matrix is non-random with , is a scalar variable representing the radius of , and is the random direction, which is independent of

and uniformly distributed on the unit sphere

in , denoted by in the sequel. This family of distributions has been widely applied in many areas, such as statistics, economics and finance, which can describe fat (or light) tails and tail dependence among components of a population, see Fang and Zhang (1990) and Gupta et al. (2013). Evidently such distributions with high order correlations can not be modeled by the linear transform model in (1.2).

A question raised immediately is that how the nonlinear dependency affects the asymptotic behaviors of LSSs in high-dimensional frameworks? Indeed, Bai and Zhou (2008) proved that the SD of converges to a common generalized Marčenko-Pastur law almost surely if, for any sequence of symmetric matrices with bounded spectral norm,


as . This condition is also sharp for the convergence, see Li and Yao (2017)

for an example. What is more, this condition holds for a list of well known elliptical distributions, such as multivariate normal distributions, multivariate Pearson type II distributions, power exponential distributions, and a more general family of multivariate Kotz-type distributions

(Kotz, 1975), see Section 2

for more details. However, the limit of SD is not enough for many procedures of statistical inference, such as confidence interval and hypothesis testing. Therefore, in this paper, we will explore the fluctuations of LSSs of

, when the population belongs to elliptical distributions that satisfy the condition (1.5). Compared with the pioneer work of Bai and Silverstein (2004), the main difficulty of the current study lies in the fact that both the radius and direction introduce nonlinear dependence to the coordinates of the population , which can not be handled through the same way as they did for the linearly dependent structure. Technically, we are facing the following three challenges. First, for paying the cost of dropping linearly dependent structure, we have to add more moment conditions on , because the finite fourth moment of is no longer sufficient for the nonlinear dependence case (see (2.5)). This is totally different from the linearly dependent structure case. Second, we need to figure out how the dependence of the entries of influences the fluctuations of LSSs of (see Remark 2.3). Third, we have to extend many fundamental conclusions in the independent case (Bai and Silverstein (2004)) to accommodate our non-linearly dependent structure; see Lemma A.1-A.4 for example.

The structure of this paper is as follows. Firstly in Section 2, we set up a new CLT for LSSs of under elliptical distributions satisfying (1.5). Then in Section 3, based on the derived results, we theoretically investigate the problem of sphericity test for covariance matrices. This is done by discussing a John’s-type test from Tian et al. (2015) for general alternative models and a likelihood ratio test from Onatski et al. (2013) for spiked covariances under arbitrary elliptical distributions. For illustration, the John-type test is applied to analyze a dataset of weekly stock returns in Section 4. Technical proofs of the main results are gathered in Section 5. Some supporting lemmas are postponed to Appendix. The paper has also an on-line supplementary file which includes the following materials: (i) CLT for general moment LSSs; (ii) simulations regarding the John-type test; (iii) Proofs of some lemmas.

2 High-dimensional theory for eigenvalues of

This section investigates asymptotic behaviors of the eigenvalues of , referred as sample eigenvalues. We begin with proposing an equivalent condition of (1.5) under the settings of the elliptical model in (1.4).

Lemma 2.1.

Suppose that a -dimensional random vector has a stochastic form as defined in (1.4) with the radius normalized as . If the spectral norm of is uniformly bounded in , then the following two conditions are equivalent:

as , where is any sequence of symmetric matrices with bounded spectral norm.

Remark 2.1.

The fourth moment condition b) together with the normalization characterize the class of elliptical distributions discussed in this paper. For the normal case, the squared radius and thus and . In general, the typical order of is with a constant. Hence a specific elliptical distribution can be recognized by evaluating the ratio


as We note that the parameter has a non-negligible effect on the limiting distributions of LSSs of , see Theorem 2.2. The proof of Lemma 2.1 is given in the supplementary material (Hu et al., 2017).

In the following we provide three examples of elliptical family satisfying the condition (2.1). Some commonly seen elliptical distributions are also checked and the results are summarized in Table 1.

Example 2.1.

A -dimensional centered multivariate Pearson type II distribution has a density function


where and . The stochastic representation of such a distribution is , where follows the distribution , see Fang and Zhang (1990). Therefore, we have

which verifies the condition in (2.1) with .

Example 2.2.

The family of Kotz-type distributions introduced by Kotz (1975) is an important class of elliptical distributions, which includes normal distributions, exponential power distributions, and double exponential distribution as special cases. The density function of a centered Kotz-type random variable is


where with and . Write . The power of the radius is

which has the characteristic function


where the seconded integral is derived by polar coordinates transformation. This characteristic function implies that

follows the Gamma distribution

. Simple calculations reveal that

which verifies the condition in (2.1) with . For the mentioned three special cases, their details are presented in the 2-4th rows of Table 1.

Example 2.3.

Let with independent of , where is a sequence of i.i.d. random variables with

Then it is simple to check that and which verifies the condition in (2.1) with .

We should note that the condition (2.1) excludes some elliptical distributions, such as multivariate student- distributions and normal scale mixtures, as shown in the 5-6th rows of Table 1. Indeed, sample eigenvalues from these distributions do not obey the generalized Marčenko-Pastur law (El Karoui, 2009; Li and Yao, 2017), which are then out of the scope of this paper.

Distribution of Condition (2.1)
Normal Holds ().
Double exponential Holds ().
Exponential power Holds ().
Student- Not hold.
Normal scale mixture Not hold.
Table 1: Some elliptical distributions and their verification of the condition (2.1). The notation “” in the last row denotes independence.

Now we are ready to investigate the asymptotic properties of sample eigenvalues in high-dimensional frameworks, under the following assumptions.

Assumption (a).  Both the sample size and dimension tend to infinity in such a way that .

Assumption (b).  There are two independent arrays of i.i.d. random variables , , and satisfying for some and ,


such that for each and the observation vectors can be represented as where is a matrix.

Assumption (c).  The spectral distribution of the matrix

weakly converges to a probability distribution

, as , referred as Population Spectral Distribution (PSD). Moreover, the spectral norm of the sequence is uniformly bounded in .

In the sequel, for any function of bounded variation on the real line, its Stieltjes transform is defined by


where stands for the support of . Then we have the following theorems.

Theorem 2.1.

Suppose that Assumptions (a)-(c) hold. Then, almost surely, the empirical spectral distribution converges weakly to a probability distribution , whose Stieltjes transform is the only solution to the equation


in the set where .

Remark 2.2.

Theorem 2.1 follows from Lemma 2.1 and Theorem 1.1 in Bai and Zhou (2008), and thus we omit its proof here. Let be the Stieltjes transform of . Then Equation (2.7) can be re-expressed as


which is the so-called Silverstein equation (Silverstein, 1995).

Let be the distribution defined by (2.7) with the parameters replaced by and denote . We next study the fluctuation of centralized LSSs with form

where is a function on the real line.

Theorem 2.2.

Suppose that Assumptions (a)-(c) hold. Let be functions analytic on an open interval containing


Then the random vector

converges weakly to a Gaussian vector , with mean function

and covariance function

, where the contours and are non-overlapping, closed, counter-clockwise orientated in the complex plane, and each enclosing the support of the limiting spectral distribution .

Remark 2.3.

When the population is normal, or rather , this theorem coincides with the main result in Bai and Silverstein (2004). It implies that the high order correlation among the components of the population affects both the limiting mean vectors and the covariance matrices of LSSs by additive quantities proportional to . This factor can be further decomposed into two parts, and , which correspond respectively to the effect from the radius and that from the direction (considering the case ). It’s interesting to see that these two kinds of dependency have opposite effects and they may cancel each other for normal population.

As an application of Theorem 2.2, we consider , the first two moments of sample eigenvalues. Theorem 2.2 implies


where the parameters possess explicit expressions as

where and for . For LSSs of higher order moments, explicit formulas of their limiting means and covairances are discussed in the supplementary material (Hu et al., 2017).

Figure 1: Normal QQ-plots for normalized and from 10,000 independent replications. Upper panels: with . Lower panels: with . The dimensional settings are .

We conduct a small simulation experiment to examine the fluctuations of and . In the experiment, the PSD is fixed at . The distribution of is selected as (1) with and (2) with , which correspond the CLT with and , respectively. The factors and are selected to satisfy . The dimensional settings are and the number of independent replications is . Normal QQ-plots for normalized statistics, i.e. and , are displayed in Figure 1. Their asymptotic standard normality is well confirmed in all studied cases.

3 Testing for high-dimensional spherical distributions

3.1 John’s test and its extension

In this section, we revisit the sphericity test for covariance matrices in high-dimensional frameworks. For this particular test probelm, the underlying population can follow arbitrary elliptical distribution, which may violate the condition in (1.5).

The sphericity test on the covariance matrix is


where is an unknown scalar parameter. When the dimension is fixed, for normal populations, John (1972) proposed a locally most powerful invariant test statistic to deal with the sphericity hypothesis based on the spectrum of sample covariance matrices. Due to its concise form and broad applicability, this kind of test is quite favorable for high dimensional situations and has been extensively studied in recent years. See, for example, Ledoit and Wolf (2002), Wang and Yao (2013), Tian et al. (2015) for the linear transform model in (1.2), while Zou et al. (2014) and Paindaveine and Verdebout (2016) for the elliptical model in (1.4). In particular, the test statistic in Tian et al. (2015) synthesizes the first four moments of sample eigenvalues, by which it gains extra powers for spike-like alternative covariance matrices. However this statistic is not valid for general elliptical populations (Li and Yao, 2017). Hence, we next develop an analogue test procedure with the help of the theoretical results in Section 2, and then compare it numerically with that from Paindaveine and Verdebout (2016).

Since the hypotheses in (3.1) are only concerned with the shape component of , by convention, we transform the original samples into the so-called spatial-sign samples, that is,

Therefore, testing the sphericity of can now be converted to testing the identity of . This inference can be realized by constructing spectral statistics of . Specifically, let

By verifying the condition in (1.5) for , one may conclude that Theorem 2.1 also holds for with all conditions on removed. Then, similar to Tian et al. (2015), from the fact that

, one may obtain estimators of

and as

respectively, and two simple statistics for the sphericity test as

Moreover, their joint null distribution is directly from (2.10) with .

Theorem 3.1.

Suppose that Assumptions (a)–(c) [removing the moment conditions in (2.5

)] hold. Under the null hypothesis,

where and the covariance matrix with and

The two statistics and , together with their null distributions, provide two test procedures for the identity of (thus the sphericity of ). The test statistic agrees with that in Paindaveine and Verdebout (2016), where its null asymptotic distribution is proved to be universal whenever . For the case where the population mean is unknown, see Zou et al. (2014). The test statistic is new. Compared with , it is more sensitive to extreme eigenvalues of and thus can serve as a complement of . Parallel to Tian et al. (2015), a joint statistic of and can be constructed as

where the two original statistics are both standardized according to their asymptotic null distributions.

Theorem 3.2.

Suppose that Assumptions (a)–(c) [removing the moment conditions in (2.5)] hold and let .

  • Under the null hypothesis, for any ,

    where .

  • Under the alternative hypothesis, if then the power of the test goes to 1 as .

The asymptotic null distribution of is an immediate consequence of Theorem 3.1. The consistency of can be proved by showing either the consistency of or . As the consistency of has been given in Zou et al. (2014), we omit its proof.

We have run a simulation experiment for the tests , , and to check their finite-sample properties under similar model settings as in Tian et al. (2015). The results show that all the three tests have satisfactory empirical sizes and powers. In addition, compared with and , the test exhibits its robustness against different types of alternative models, see the supplementary material Hu et al. (2017).

3.2 Sphericity test under spiked alternative model

The sphericity test applies to general alternative models. However, its consistency requires which excludes the well-known spiked covariance model (Johnstone, 2001). For the simplest spiked model, the covariance matrix can be expressed as where and are as before, is a constant, and is a unit vector in . Both and are unknown parameters. Thus the sphericity hypotheses in (3.1) reduce to


It’s obvious that will asymptotically fail to reject such alternatives since . What’s more, this testing problem will become more difficult but attractive when the signal falls below the threshold , see Berthet and Rigollet (2013); Onatski et al. (2013, 2014); Donoho and Jin (2015), and references therein. Hence, applying the CLT for LSSs under elliptical distributions, we discuss a test procedure for (3.2) proposed by Onatski et al. (2013), which was built under normal populations.

In Onatski et al. (2013)

, the authors discussed a likelihood ratio test based on the joint distribution of sample eigenvalues from normal populations. This test was especially designed for the local alternative

and the employed statistic was approximated by a special LSS. In our settings, this LSS can be formulated as


where is a testing parameter and . The upper bound of is chosen as for and for such that is larger than the limit of , the largest sample eigenvalues. Applying Theorem 2.2, one may get the asymptotic distribution of under general elliptical distributions.

Theorem 3.3.

Suppose that Assumptions (a)–(c) [removing the moment conditions in (2.5)] hold. Under the null hypothesis, for any fixed ,


where the respective mean and variance functions are

The proof of Theorem 3.3 is given in the supplementary material (Hu et al., 2017). Given a value of and a significance level , the test rejects if , where denotes the standard normal distribution function. Unlike Onatski et al. (2013), the theoretical power of this test is not available at present since is not a likelihood ratio statistic in elliptical distributions. Another reason is that Theorem 2.2 is inapplicable to this situation since the spatial-sign sample is not anymore elliptically distributed under .

Let’s take a step back and only consider the testing problem in elliptical distributions satisfying (2.1). For simplicity, we assume is known and set , so that the test is still valid by simply substituting the sample covairance into , i.e.,

whose asymptotic distribution under both the null and alternative hypotheses is described in the following theorem.

Theorem 3.4.

Suppose that Assumptions (a)–(c) hold. Let be the true value of and , then for any fixed ,


where the respective mean and variance functions are

This theorem is a direct conclusion of Theorem 2.2. It proof is similar to that of Theorem 3.3 and we thus omit it here. From Theorem 3.4, the power function of is

For normal populations (), this power function reaches its maximum at , which agrees with (5.1) in Proposition 9 of Onatski et al. (2013). In general, the maximizer may not locate at . An interesting case is , for which the power function tends to 1 as . This is from the fact that

At this time, will successfully detect any positive as long as is close to zero.

4 An empirical study

For illustration, we apply the test procedure based on to analyze weekly returns of the stocks from S&P 500. The tests and

are not included in this analysis since there is a lack of evidence to fit the data using the simplest spiked model. According to The North American Industry Classification System (NAICS), which is used by business and government to classify business establishments, the 500 stocks can be divided into 20 sectors. Nine of them are removed from our analysis since their numbers of stocks are all less than 10. The remaining 11 sectors as well as their numbers of stocks are listed in Table


Sector  1  2   3  4   5  6  7  8  9  10  11
Number of stocks 30 32 189 17 36 14 37 65 14 17 11
Table 2: Number of stocks in each NAICS Sectors.

Usually the stocks in the same sector are correlated, and the stocks in different sectors are uncorrelated. So it is expected that the weekly returns of stocks in the same sector are not spherically distributed, and it is interesting to see if the weekly returns of stocks in different sectors are spherically distributed. In the following, we apply to stocks in the same sector and stocks in different sectors respectively.

The original data are the closing prices or the bid/ask average of these stocks for the trading days in the first half of 2013, i.e., from 1 January 2013 to 30 June 2013, with total 124 trading days. This dataset is downloadable from the Center for Research in Security Prices Daily Stock in Wharton Research Data Services. The testing model is established as follows. Denote as the number of stocks in the th sector, as the price of the th stock in the th sector on Wednesday of the th week. The reason that we choose Wednesday’s price here is to avoid the weekend effect in stock market. Thus we get 22 returns for each stock. In order to meet the condition of the proposed test, the original data should be transformed by logarithmic difference, which is a very commonly used procedure in finance. There are a number of theoretical and practical advantages of using logarithmic returns. One of them is that the sequence of logarithmic returns are independent of each other for big time scales (e.g. 1 day, see Rama (2001)). Denote , and , where is the sample size.

Now applying to the dataset , , respectively, we obtain 11 -values, which are all below . Therefore, we have strong evidence to believe that stocks in the same sector are not spherically distributed. This is consistent with our intuition. Next, we consider stocks in different sectors. Specifically, we choose one stock from each sector to make up a group of 11 cross-sectoral stocks and then test whether these stocks are spherically distributed. Because there are about different groups, we just randomly draw 1,000,000 groups from them to analyze. It turns out that the largest -value is 0.3889, 231 -values are bigger than 0.05, and 69 -values are bigger than 0.1. These results again demonstrate that, when the number of stocks is not very small, it is hard to say weekly logarithmic returns for the stocks are spherically distributed. It is also very interesting to analyze these spherically distributed stocks in different sectors, which have almost the same variances.

5 Proof of Theorem 2.2

The proof of Theorem 2.2 relies on analyzing the resolvent of the sample covariance matrix and the general strategy follows the approach in Bai and Silverstein (2004). Also see Bai et al. (2015) and Gao et al. (2017) for recent developments. However, as we are dealing with the new model equipped with nonlinear dependency, all technical steps of implementing this strategy have to be updated, or at least revalidated. They are presented in this section.

5.1 Sketch of the proof of Theorem 2.2

Let be arbitrary, any number greater than the right end point of interval (2.9), and any negative number if the left end point of (2.9) is zero, otherwise choose . Let and define a contour


By definition, this contour encloses a rectangular region in the complex plane containing the support of the LSD . Denote by and the Stieltjes transforms of the ESD and the LSD , respectively. Their companion Stieltjes transforms are given by

With these notation, we define an empirical process on as

Since in Theorem 2.2 are analytic on an open region containing the interval (2.9) (thus analytic on the region enclosed by ), by Cauchy’s integral formula, we have for any complex numbers ,

when all sample eigenvalues fall in the interval , which is correct with overwhelming probability. In order to remove the small probability event that some sample eigenvalues fall outside the interval, we need a truncated version of , denoted by . Specifically, let be a sequence decreasing to zero satisfying for some . The truncated process for is given by



on which agrees with , is a regularized set of excluding a small segment near the real line. Then we have

Lemma 5.1.

Under the same assumptions in Theorem 2.2, we have for any ,