1 Introduction
When a multivariate data set is potentially contaminated by outliers, sample covariance matrix (SCM) becomes less reliable. A wide range of robust alternatives has been proposed in the literature starting from the early Mestimators (Maronna, 1976; Huber, 1977), the minimum volume ellipsoid and minimum determinant estimators (Rousseeuw, 1985), the StahelDonoho estimators (Humpel et al., 1986; Donoho and Gasko, 1992) and Tyler’s scatter matrix (Tyler, 1987). These estimators enjoy a high breakdown point and most of them are desirably affine equivariant. For booklength discussions on these classical estimators, we refer to Maronna et al. (2006) and Oja (2010), see also Magyar and Tyler (2014) for a sensible review. However many of these estimators are implicitly defined only, and this lack of an analytically tractable form leads to certain difficulty for their computation and theoretical analysis. Such difficulty is even more pronounced when the number of variables is large. This motivates a growing recent research for more tractable robust scatter estimators that might not be affine equivariant.
A particularly studied estimator is the spatial sign matrix, first introduced in Locantore et al. (1999) and Visuri et al. (2000). The former paper introduces an influential
robust principal component analysis
based on the spatial sign matrix. Two striking examples in the paper, shown on Figures 1416 and Figures 1517, respectively, demonstrate how the SCM leads to a much distorted principal components (PCs) in the presence of a single outlier, and how, at the same time, the spatial sign matrix is able to mitigate the impact of such highly influential outliers. A number of papers have followed since then, especially within the groups around H. Oja and D.E. Tyler, respectively, see Gervini (2008), Sirkia et al. (2009), Taskinen et al. (2010, 2012), Dürre et al. (2014, 2015, 2017) and Dürre and Vogel (2016).Let us define the sample spatial sign covariance matrix (SSCM). The spatial sign of a nonnull
dimensional vector
is , i.e. its projection on the dimensional unit sphere. For completeness, we set . This is called a sign because for the univariate case with , are ordinary signs. Given a sample from a variate population , the sample SSCM is(1.1) 
Here is an estimate for the spatial median of the population which is determined by the equation (zero of the mean spatial sign function). The population SSCM is . If the data is already centered, one may assume and consider the sample SSCM
(1.2) 
Despite its simplicity, is a rich scatter statistic for a multivariate population. It is indeed the exact counterpart of the usual SCM when one shifts from the Euclidean distance (or distance) to the Manhattan block distance (or distance) in .
When the number of variables is large compared to the sample size , the sample SSCM will likely deviate from the population SSCM due to the highdimensional effect. Indeed for the usual covariance matrix , such highdimensional distortion between and the sample covariance matrix is now well understood with the aid of random matrix theory, see Johnstone (2007) and Paul and Aue (2014). Typically, sample eigenvalues from have a much wider spread than the population eigenvalues of , and this deformation is precisely described by the famous MarčenkoPastur law. A main result from the current paper shows that for the spatial sign matrix, such highdimensional distortion again happens when is large compared to .
Such highdimensional distortion is particularly critical to the robust PCA or robust covariance matrix estimation proposed in Locantore et al. (1999) and Visuri et al. (2000). For example, the procedure (C1)(C2)(C3) on page 566 of Visuri et al. (2000) for a robust estimator of the population covariance matrix works as follows when applied with spatial signs. Given a centered data sample from a dimensional population:
Estimate the marginal variances (eigenvalues, principal values) of
, , using any univariate robust scale estimate (MAD, etc.). Write for the estimates.The estimate for is .
In the highdimensional context, estimates found in the steps (C1) and (C2) become seriously biaised. For (C1), the eigenvectors in can be far away from the their population counterpart (of the population SSCM ), so a fortiori far away from the eigenvectors of the population covariance matrix . For (C2), the marginal variances of the projections could be very different of the eigenvalues of . As for the robust PCA proposed in Locantore et al. (1999), it will suffer from the same highdimensional distortion because the procedure also uses the eigenvectors of the sample spatial sign matrix to estimate the eigenvectors of the population covariance matrix, exactly as in Step (C1) above.
Therefore it is necessary to correct such highdimensional distortion appeared in the sample SSCM in order to preserve its long established attractiveness such as robustness. In this paper, using tools of random matrix theory, we investigate asymptotic spectral behaviors of the sample SSCM in highdimensional frameworks where both the dimension and the sample size tend to infinity. We restrict ourselves to the family of elliptical distributions for the population for two reasons. Firstly, if is elliptically distributed, the population SSCM and the population covariance matrix share same eigenvectors while their respective eigenvalues are in an onetoone correspondence through a wellknown map (Boente and Fraiman, 1999). Secondly, highdimensional study as the one developed in this paper but for a more general population seems out of reach at the moment. The first main result of the paper (Theorem 3.1) is an analogue of MarčenkoPastur law for the limiting distribution of eigenvalues of . This law has been so far known for sample covariance matrices and sample correlation matrices only. The second main result of the paper provides a central limit theorem (CLT) for linear spectral statistics of (Theorem 3.2). This CLT is the cornerstone for all subsequent applications we developed in the paper. These applications are designed to demonstrate the effectiveness of the theory we develop here for the SSCM .
There have been a few very recent works in the literature that deal with the highdimensional SSCM (or its variants), namely Zou et al. (2014), Feng and Sun (2016), Li et al. (2016), and Chakraborty and Chaudhuri (2017)
. A common feature in these papers is that given a specific null hypothesis on the population location or scatter, in a onesample or twosample design, the authors have in their disposal a specific test statistic which is an explicit function of
(or its variants). They thus directly study the statistic using traditional asymptotic methods such as projections (as in a Ustatistic) or a martingale decomposition. None of these papers studied the distribution of the eigenvalues of as done in this paper using random matrix theory. Meanwhile, some of these test statistics are indeed linear spectral statistics of . Therefore in these cases, the CLT developed in this paper leads to an independent and new proof for these existing results. However this comparison will not be pursued here but in a later separated work.The remaining of the paper is as follows. Section 2 summarizes some preliminary results from elliptical distributions and related random matrix theory. Section 3 establishes the two main theoretical results of the paper (Theorems 3.1 and 3.2). Application to spectral moments statistics is fully addressed with explicit limiting mean and covariance functions in the corresponding central limit theorem. Then in Section 4, relying on these results, we develop two statistical applications on the spectrum of , the population SSCM, under a setting where the spectrum forms a discrete distribution with finite support. In one application, the spectrum of is estimated using the method of moments, and in the other application, we test the hypothesis that there are no more than distinct eigenvalues in the spectrum of . In Section5, we develop two applications of the general theory of Section 3 to robust statistics in the highdimensional context. Technical proofs of the main theorems are gathered in Section 6. Some useful lemmas and their proofs are postponed to the last section.
2 Preliminaries
The family of elliptical distributions is an important extension to the multivariate normal distribution and has been broadly used in data analysis in various fields
(Gupta et al., 2013). A random vector with zero mean is elliptically distributed if it has a stochastic representation:(2.1) 
where is a
deterministic and invertible matrix,
a scalar random variable representing the scale of
, and is the random direction, independent ofand uniformly distributed on the unit sphere in
. Besides the normal distribution, this family includes many other celebrated distributions, such as multivariate distribution, Kotztype distributions, and Gaussian scale mixtures. Clearly the population covariance matrix is where is in fact the shape matrix of the population. In order to resolve the indeterminacy between the scales of and , we will use throughout the paper the normalization .Let be a sequence of independent and identically distributed (i.i.d.) random vectors from the elliptical population (2.1). We consider the sample SSCM defined in (1.2) and scale it as
(2.2) 
The reason for this scaling is that now and the eigenvalues of are of order in average.
In this paper, using tools of random matrix theory, we investigate limiting properties of the eigenvalues of in a highdimensional setting. Precisely, both the dimension and the sample size tend to infinity with their ratio , a positive constant in .
Let be a matrix with eigenvalues . Its empirical spectral distribution
(ESD) is by definition the probability measure
where denotes the Dirac mass at . If this sequence has a limit when , the limit is referred as a limiting spectral distribution, or LSD, of the sequence. Our aim is to study the limiting properties of and CLT for linear spectral statistics (LSS) of the form for a class of smooth test functions . These properties may become powerful tools to recover spectral features of the scaled population SSCM, i.e. , and then those of the shape matrix since the matrices and share the same eigenvectors and their eigenvalues have a onetoone correspondence (Boente and Fraiman, 1999). Moreover, as , the two matrices coincide in the sense that the spectral norm , as long as (or ) is uniformly bounded, see Lemma 7.1.
Spectral properties of a standard highdimensional SCM have been extensively studied in random matrix theory since the pioneer work of Marčenko and Pastur (1967). The standard model in this literature has the form
(2.3) 
where is as before, is a constant, and is a set of i.i.d. random variables satisfying E, E, and E. Let be i.i.d. copies of and be the corresponding SCM. It is well known that the ESD of converges to the celebrated MarčenkoPastur (MP) law when , and to a generalized MP law for general matrix , as with (Marčenko and Pastur, 1967; Silverstein, 1995). The CLT for LSS of was first studied in Jonsson (1982) by assuming the population is a standard multivariate normal. One breakthrough on the CLT was obtained by Bai and Silverstein (2004), where the population is allowed to be general with E. This fourth moment condition was then weakened to E in Pan and Zhou (2008). For more references, one can refer to Bai and Silverstein (2010), Bai et al. (2015), Gao et al. (2016), and references therein. However, these results do not apply to general elliptical populations since the two models in (2.1) and (2.3) have little in common, except for normal populations. In fact, for general elliptical populations, it has been reported that the ESD of the SCM converges to a deterministic distribution that is not a generalized MP law, but has to be characterized by both the distribution of and the limiting spectrum of through a system of implicit equations (El Karoui, 2009; Li and Yao, 2018). The involvement of seriously interferes with our understanding of the spectrum of from the ESD of . This again motivates us to shift our attention to the SSCM which discards the random radius and focus only on the directions .
3 Highdimensional theory for eigenvalues of
3.1 Limiting spectral distribution of
In this section we derive a LSD for the sequence under the assumptions below.
Assumption (a). Both the sample size and population dimension tend to infinity in such a way that and .
Assumption (b). Sample observations are , where is a deterministic and invertible matrix with and are i.i.d. random vectors, uniformly distributed on the unit sphere in .
Assumption (c). The spectral norm of is bounded and its spectral distribution
converges weakly to a probability distribution
, called population spectral distribution (PSD). Moreover, the spectral moments of are denoted by and their limits by .From Lemma 7.1, it is clear that the spectral distributions of and are asymptotically identical. So one can certainly replace with in Assumption (c), which does not affect the LSD of . However we keep because it is easy to describe the CLT for LSS using the spectral distribution of .
For the characterization of the LSD of , we need to introduce the Stieltjes transform of a measure on the real line, which is defined as
where .
Theorem 3.1.
Suppose that Assumptions (a)(c) hold. Then, almost surely, the empirical spectral distribution converges weakly to a probability distribution , whose Stieltjes transform is the unique solution to the equation
(3.1) 
in the set .
The LSD defined in (3.1) is a generalized MP law already appeared in the seminal paper Marčenko and Pastur (1967). Let denote the Stieltjes transform of . Then (3.1) can also be represented as (Silverstein, 1995)
(3.2) 
For procedures on finding the density function and the support set of from (3.1) and (3.2), one is referred to Bai and Silverstein (2010).
3.2 CLT for linear spectral statistics of
Let be the LSD as defined in (3.1) with the parameters replaced by . Let . We now study the fluctuation of socalled LSS of the form
where is some given measurable function. Define also the interval
(3.3) 
Theorem 3.2.
Suppose that Assumptions (a)(c) hold. Let be functions analytic on an open set that includes the interval (3.3). Then the random vector converges weakly to a Gaussian vector with mean function
and covariance function
Here and the contours and are nonoverlapping, closed, counterclockwise orientated in the complex plane and enclosing the support of the LSD .
A special case of interest is a multivariate normal population that satisfies both the elliptical model (2.1
) and the linear transformation model (
2.3). In this case, it is interesting to compare the limiting distribution in Theorem 3.2 based on the sample SSCM with the classical CLT in Bai and Silverstein (2004) based on the SCM . One finds that some additional and new terms appear in Theorem 3.2, namely the second contour integral in the mean function and the second to fourth terms in the covariance function above do not exist in the classical CLT in Bai and Silverstein (2004).Another closely related work is Hu et al. (2019), where the authors study elliptical population by assuming being independent of . Though sharing the same form, our model violates their independent assumption. Specifically, we takes which is correlated with . It will be shown that such correlation is not (asymptotically) negligible for the distribution of LSS.
3.3 Asymptotic distributions of spectral moments
Among all LSS, the following spectral moments of are particularly important:
The first moment is 1 since . All other moments , , are random. Define the moments of the related MP laws
From Nica and Speicher (2006), the quantities and (moments of ) are related through the recursive formulae:
(3.4) 
and , where the sum runs over the following partitions of :
and The joint limiting distribution of moments can be derived from Theorem 3.2 by considering the moment functions . For this particular case, the mean and covariance functions in the limiting distribution can be explicitly calculated.
Corollary 3.1.
Suppose that Assumptions (a)(c) hold. Then the random vector
The mean vector is given by
where , , and denotes the th derivative of with respect to . The covariance matrix has entries
where .
4 Applications to spectral inference
A natural question on spatial signs is how to infer the population SSCM from the sample SSCM when the dimension is large. If the question was for the pair of population and sample covariance matrices , this falls in the widely studied problem of estimating a large covariance matrix. Noting the fundamental difference between an SSCM and a standard covariance matrix, we indeed found nothing in the literature for properties of a highdimensional SSCM. (to our best knowledge).
In this section, we consider a scenario where the PSD of can be modeled as a finite mixture of point masses. Using the theory of Section 3, we propose two new inference tools for the PSD . First an asymptotic normal estimator is found for such a finitemixture PSD . This estimator is particularly interesting for an elliptical population because the eigenvalues of and are then in a wellknown onetoone correspondence. This will finally lead to a robust estimator for much better than some existing proposals, for example, the estimator from the procedure (C1)(C2)(C3) of Visuri et al. (2000). The second inference tool we develop treats the question of determination of the order of the finite mixture in .
Precisely, the family of PSDs under study is a class of parameterized discrete distributions with finite support on , that is,
(4.1) 
where Here the restriction is due to the normalization condition . Note that the model (4.1) depends on an integer parameter , referred as the order of . Such finite mixtures have already been employed for the standard large covariance matrix , see El Karoui (2008), Rao et al. (2008), Bai et al. (2010) and Li and Yao (2014). Similar to El Karoui (2008), we adopt the setting of fixed PSDs in this section, i.e. for all large.
4.1 Estimation of a PSD
For the model in (4.1), we follow the moment method in Bai et al. (2010) for the PSD estimation. Given a known order , the method first estimates the moments of through the recursive formulae in (3.4), and then solve a system of moment equations, to get a consistent estimator of .
In our situation, with notation and for , we denote
as the mappings between the corresponding vectors. These mappings are all onetoone and the determinants of their Jacobian matrices are all nonzero, see Bai et al. (2010). Therefore, applying Theorem 3.1, which implies that , as . However, as shown by the CLT in Corollary 3.1, the estimator has a bias of the order . So it’s natural to modify by subtracting its limiting mean in the CLT to obtain a better estimator of . Beyond this correction, the CLT can also provide confidence regions for the parameter .
Denote the modified estimators of , , and by
(4.2) 
respectively, where with defined in Corollary 3.1 for From Theorem 3.1, Corollary 3.1, and a standard application of the Delta method, one may easily get asymptotic properties of these estimators.
Theorem 4.1.
Suppose that Assumptions (a)(c) hold and the true value is an inner point of . Then we have , , , and moreover
(4.3)  
where and represent the Jacobian matrices and , respectively, and is defined in Corollary 3.1 with .
4.2 Test for the order of a PSD
The aforementioned estimation procedure requires that the order of the PSD be prespecified. In general, this prior knowledge should be testified in advance. To deal with this problem, we consider the hypotheses
(4.4) 
where is a given positive integer. These hypotheses can also be regarded as a generalization of the wellknown sphericity hypotheses on covariance matrices, i.e. the case .
In Qin and Li (2016), a test procedure was outlined based on a moment matrix and its estimator which can be formulated as
Here we set and , as defined in (4.2), for . It has been proved that the determinant of is zero if the null hypothesis in (4.4) holds, otherwise is strictly positive (Li and Yao, 2014). Therefore, the determinant can serve as a test statistic for (4.4) and the null hypothesis shall be rejected if the statistic is large. Applying Theorem 4.1 and the main theorem in Qin and Li (2016), the asymptotic distribution of is obtained immediately.
Theorem 4.2.
Suppose that Assumptions (a)(c) hold. Then the statistic is asymptotically normal, i.e.
(4.5) 
where with , the vectorization of the adjoint matrix of . The first two rows and columns of the matrix consist of zero and the remaining submatrix is defined in (4.3). The matrix is a 01 matrix with only , , , where denotes the greatest integer not exceeding .
From Theorem 4.1, the limiting variance in (4.5) is a continuous function of . While, under the null hypothesis, this variance is a function of , denoted by . Let . Then it is a strongly consistent estimator of .
Corollary 4.1.
Suppose that Assumptions (a)(c) hold. Then, under the null hypothesis,
as . In addition, the asymptotic power of tends to 1.
Corollary 4.1 follows directly from Theorem 4.2 and its proof is thus omitted. This corollary includes as a particular case the sphericity test. For this case, the test statistic reduces to and its null distribution is consistent with that in Paindaveine and Verdebout (2016) which is obtained by a direct and completely different method.
4.3 Simulation experiments
Simulations are carried out to evaluate the performance of the proposed estimation and test for discrete PSDs in (4.1). Samples are drawn from and all empirical statistics are calculated from 10,000 independent replications.
The estimation procedure is tested for the following two PSDs.

Model 1: and ;

Model 2: and .
The sample sizes are for Model 1 and
for Model 2, respectively. In addition to empirical means and standard deviations of all estimators, we also calculate 95% confidence intervals for all parameters and report their coverage probabilities. Results are collected in Tables
1 and 2. The consistency of all estimators is clearly demonstrated.Mean  St. D.  C. P.  Mean  St. D.  C. P.  Mean  St. D.  C. P.  

0.4839  0.1145  0.9375  0.4960  0.0550  0.9491  0.5000  0.0269  0.9486  
0.4915  0.1135  0.9137  0.4968  0.0588  0.9423  0.4997  0.0292  0.9488  
1.5030  0.1330  0.9288  1.4990  0.0668  0.9426  1.4998  0.0329  0.9487  
0.5085  0.1135  0.9137  0.5032  0.0588  0.9423  0.5003  0.0292  0.9488 
Mean  St. D.  C. P.  Mean  St. D.  C. P.  Mean  St. D.  C. P.  

0.1887  0.0429  0.9227  0.1988  0.0147  0.9358  0.2003  0.0071  0.9367  
0.2824  0.0447  0.9403  0.2956  0.0184  0.9525  0.2990  0.0090  0.9483  
0.9960  0.1347  0.9345  0.9924  0.0661  0.9486  0.9991  0.0337  0.9433  
0.4064  0.0373  0.9453  0.4012  0.0209  0.9239  0.4002  0.0110  0.9351  
1.7824  0.0856  0.9236  1.7919  0.0440  0.9413  1.7960  0.0227  0.9392  
0.3113  0.0696  0.9221  0.3031  0.0365  0.9429  0.3008  0.0189  0.9420 
Next we examine the test procedure for the order of a PSD. The following two models are employed in this experiment:

Model 3: ;

Model 4: .
Here the parameter represents the distance between the null and alternative hypotheses. In particular, Model 3 is used for testing (sphericity test) with ranging from 0 to 0.2 by step 0.18 and Model 4 is for testing with ranging from 0 to 0.45 by step 0.05. The sample size is , the dimensionsample size ratios are , and the significance level is fixed at . Results summarized in Table 3 show that the proposed test has accurate empirical size and its power tends to 1 as the parameter increases under the two models.
under Model 3  

0  0.02  0.04  0.06  0.08  0.10  0.12  0.14  0.16  0.18  
5.24  5.81  9.13  17.91  34.86  62.30  87.31  98.01  99.90  100  
5.33  5.92  8.43  18.09  35.62  63.12  88.14  98.69  99.96  100  
4.76  6.39  9.69  17.39  35.23  63.57  88.15  98.67  99.97  100  
under Model 4  
0  0.05  0.10  0.15  0.20  0.25  0.30  0.35  0.40  0.45  
4.75  7.19  17.49  43.96  79.28  97.06  99.87  100  100  100  
5.05  6.31  12.22  26.78  53.74  80.74  95.07  99.52  99.97  100  
4.88  5.65  8.56  16.33  30.09  49.17  71.60  86.54  95.20  98.61 
5 Application to robust statistics
In this section we develop a few applications of the general theory of Section 3 to robust statistics using the sample SSCM .
5.1 Robustness
We examine the robustness of several estimators for the shape matrix when sample data include outliers. Four estimators derived from and are considered in this comparison. Let be the spectral decomposition of and :
where the ’s are diagonal matrices of eigenvalues, sorted in ascending order, and the ’s are matrices of corresponding eigenvectors, respectively. In addition, we define a regularization function as
for any matrix with nonzero trace. Obviously, this function normalizes such that . With the above notations, the four estimators of we examine are as follows:

Regularized SCM ,

Spectrumcorrected SCM ,
where is a collection of ascendingly sorted estimators of population eigenvalues using a moment method developed in Li and Yao (2014);

Robust sample SSCM constructed from the procedures (C1)(C2)(C3) of Visuri et al. (2000),
where with being the square of the MAD of the th row of for .

Spectrumcorrected sample SSCM ,
where the correction is obtained following three steps:
The performance of the four estimators are tested under two models below.
 Model 1:

Contaminated normal distribution of elliptical form:
where the population shape matrix is a diagonal matrix,
This model implies there are about 100% outlying observations with large amplitude. The mixing parameter takes two values (uncontaminated) and (contaminated by 1% outliers).
 Model 2:

Contaminated normal distribution of nonelliptical form:
where the population shape matrix is the same as in Model 1 and the mixing parameter takes values and . For outliers, their shape matrix is
The population dimension is , and the sample size is . All statistics are averaged from 1000 independent replications. The number of outliers is fixed at for a given .
For each estimator , we calculate three distances from to its target matrix , including the Frobenius distance, the KullbackLeibler (KL) distance and the spectral distance. The KL distance is not applicable to and for cases with because their determinants are thus zero. Figures 1, 2 and 3 summarize the results. They show that, when there is no outlier, and are comparable and both suffers from large biases caused by the highdimensional effect. Such bias can be much alleviated by means of spectral correction as demonstrated by and which have almost the same accuracy. Note that the remaining biais is clearly present and it is a pity that there is no effective way at present to remove this remaining bias entirely. Therefore, the performance of and with can serve as a benchmark for the four estimators when comparing their robustness against outliers. As shown in the figures, in the presence of outliers, the estimator is more robust than , but both of them are still heavily biased for large . This is explained by the fact that both of them are not adapted to high dimensions. For the two other estimators and with highdimensional correction, the estimator
Comments
There are no comments yet.