On eigenvalues of a high-dimensional spatial-sign covariance matrix

by   Weiming Li, et al.

Sample spatial-sign covariance matrix is a much-valued alternative to sample covariance matrix in robust statistics to mitigate influence of outliers. Although this matrix is widely studied in the literature, almost nothing is known on its properties when the number of variables becomes large as compared to the sample size. This paper for the first time investigates the large-dimensional limits of the eigenvalues of a sample spatial sign matrix when both the dimension and the sample size tend to infinity. A first result of the paper establishes that the distribution of the eigenvalues converges to a deterministic limit that belongs to the family of celebrated generalized Marčenko-Pastur distributions. Using tools from random matrix theory, we further establish a new central limit theorem for a general class of linear statistics of these sample eigenvalues. In particular, asymptotic normality is established for sample spectral moments under mild conditions. This theory is established when the population is elliptically distributed. As applications, we first develop two new tools for estimating the population eigenvalue distribution of a large spatial sign covariance matrix, and then for testing the order of such population eigenvalue distribution when these distributions are finite mixtures. Using these inference tools and considering the problem of blind source separation, we are able to show by simulation experiments that in high-dimensional situations, the sample spatial-sign covariance matrix is still a valid and much better alternative to sample covariance matrix when samples contain outliers.



There are no comments yet.


page 1

page 2

page 3

page 4


A CLT for the LSS of large dimensional sample covariance matrices with unbounded dispersions

In this paper, we establish the central limit theorem (CLT) for linear s...

Asymptotic normality for eigenvalue statistics of a general sample covariance matrix when p/n →∞ and applications

The asymptotic normality for a large family of eigenvalue statistics of ...

Spectrum Estimation from Samples

We consider the problem of approximating the set of eigenvalues of the c...

A generalized spatial sign covariance matrix

The well-known spatial sign covariance matrix (SSCM) carries out a radia...

Testing for subsphericity when n and p are of different asymptotic order

In this short note, we extend a classical test of subsphericity, based o...

One-shot Distributed Algorithm for Generalized Eigenvalue Problem

Nowadays, more and more datasets are stored in a distributed way for the...

Estimating the Intrinsic Dimension of Hyperspectral Images Using an Eigen-Gap Approach

Linear mixture models are commonly used to represent hyperspectral datac...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When a multivariate data set is potentially contaminated by outliers, sample covariance matrix (SCM) becomes less reliable. A wide range of robust alternatives has been proposed in the literature starting from the early M-estimators (Maronna, 1976; Huber, 1977), the minimum volume ellipsoid and minimum determinant estimators (Rousseeuw, 1985), the Stahel-Donoho estimators (Humpel et al., 1986; Donoho and Gasko, 1992) and Tyler’s scatter matrix (Tyler, 1987). These estimators enjoy a high breakdown point and most of them are desirably affine equivariant. For book-length discussions on these classical estimators, we refer to Maronna et al. (2006) and Oja (2010), see also Magyar and Tyler (2014) for a sensible review. However many of these estimators are implicitly defined only, and this lack of an analytically tractable form leads to certain difficulty for their computation and theoretical analysis. Such difficulty is even more pronounced when the number of variables is large. This motivates a growing recent research for more tractable robust scatter estimators that might not be affine equivariant.

A particularly studied estimator is the spatial sign matrix, first introduced in Locantore et al. (1999) and Visuri et al. (2000). The former paper introduces an influential

robust principal component analysis

based on the spatial sign matrix. Two striking examples in the paper, shown on Figures 14-16 and Figures 15-17, respectively, demonstrate how the SCM leads to a much distorted principal components (PCs) in the presence of a single outlier, and how, at the same time, the spatial sign matrix is able to mitigate the impact of such highly influential outliers. A number of papers have followed since then, especially within the groups around H. Oja and D.E. Tyler, respectively, see Gervini (2008), Sirkia et al. (2009), Taskinen et al. (2010, 2012), Dürre et al. (2014, 2015, 2017) and Dürre and Vogel (2016).

Let us define the sample spatial sign covariance matrix (SSCM). The spatial sign of a nonnull

-dimensional vector

is , i.e. its projection on the -dimensional unit sphere. For completeness, we set . This is called a sign because for the univariate case with , are ordinary signs. Given a sample from a -variate population , the sample SSCM is


Here is an estimate for the spatial median of the population which is determined by the equation (zero of the mean spatial sign function). The population SSCM is . If the data is already centered, one may assume and consider the sample SSCM


Despite its simplicity, is a rich scatter statistic for a multivariate population. It is indeed the exact counterpart of the usual SCM when one shifts from the Euclidean distance (or distance) to the Manhattan block distance (or distance) in .

When the number of variables is large compared to the sample size , the sample SSCM will likely deviate from the population SSCM due to the high-dimensional effect. Indeed for the usual covariance matrix , such high-dimensional distortion between and the sample covariance matrix is now well understood with the aid of random matrix theory, see Johnstone (2007) and Paul and Aue (2014). Typically, sample eigenvalues from have a much wider spread than the population eigenvalues of , and this deformation is precisely described by the famous Marčenko-Pastur law. A main result from the current paper shows that for the spatial sign matrix, such high-dimensional distortion again happens when is large compared to .

Such high-dimensional distortion is particularly critical to the robust PCA or robust covariance matrix estimation proposed in Locantore et al. (1999) and Visuri et al. (2000). For example, the procedure (C1)-(C2)-(C3) on page 566 of Visuri et al. (2000) for a robust estimator of the population covariance matrix works as follows when applied with spatial signs. Given a centered data sample from a -dimensional population:

  • Construct eigenvector estimates using the eigenvectors of the sample SSCM

    , say, matrix .

  • Estimate the marginal variances (eigenvalues, principal values) of

    , , using any univariate robust scale estimate (MAD, etc.). Write for the estimates.

  • The estimate for is  .

In the high-dimensional context, estimates found in the steps (C1) and (C2) become seriously biaised. For (C1), the eigenvectors in can be far away from the their population counterpart (of the population SSCM ), so a fortiori far away from the eigenvectors of the population covariance matrix . For (C2), the marginal variances of the projections could be very different of the eigenvalues of . As for the robust PCA proposed in Locantore et al. (1999), it will suffer from the same high-dimensional distortion because the procedure also uses the eigenvectors of the sample spatial sign matrix to estimate the eigenvectors of the population covariance matrix, exactly as in Step (C1) above.

Therefore it is necessary to correct such high-dimensional distortion appeared in the sample SSCM in order to preserve its long established attractiveness such as robustness. In this paper, using tools of random matrix theory, we investigate asymptotic spectral behaviors of the sample SSCM in high-dimensional frameworks where both the dimension and the sample size tend to infinity. We restrict ourselves to the family of elliptical distributions for the population for two reasons. Firstly, if is elliptically distributed, the population SSCM and the population covariance matrix share same eigenvectors while their respective eigenvalues are in an one-to-one correspondence through a well-known map (Boente and Fraiman, 1999). Secondly, high-dimensional study as the one developed in this paper but for a more general population seems out of reach at the moment. The first main result of the paper (Theorem 3.1) is an analogue of Marčenko-Pastur law for the limiting distribution of eigenvalues of . This law has been so far known for sample covariance matrices and sample correlation matrices only. The second main result of the paper provides a central limit theorem (CLT) for linear spectral statistics of (Theorem 3.2). This CLT is the corner-stone for all subsequent applications we developed in the paper. These applications are designed to demonstrate the effectiveness of the theory we develop here for the SSCM .

There have been a few very recent works in the literature that deal with the high-dimensional SSCM (or its variants), namely Zou et al. (2014), Feng and Sun (2016), Li et al. (2016), and Chakraborty and Chaudhuri (2017)

. A common feature in these papers is that given a specific null hypothesis on the population location or scatter, in a one-sample or two-sample design, the authors have in their disposal a specific test statistic which is an explicit function of

(or its variants). They thus directly study the statistic using traditional asymptotic methods such as projections (as in a U-statistic) or a martingale decomposition. None of these papers studied the distribution of the eigenvalues of as done in this paper using random matrix theory. Meanwhile, some of these test statistics are indeed linear spectral statistics of . Therefore in these cases, the CLT developed in this paper leads to an independent and new proof for these existing results. However this comparison will not be pursued here but in a later separated work.

The remaining of the paper is as follows. Section 2 summarizes some preliminary results from elliptical distributions and related random matrix theory. Section 3 establishes the two main theoretical results of the paper (Theorems 3.1 and 3.2). Application to spectral moments statistics is fully addressed with explicit limiting mean and covariance functions in the corresponding central limit theorem. Then in Section 4, relying on these results, we develop two statistical applications on the spectrum of , the population SSCM, under a setting where the spectrum forms a discrete distribution with finite support. In one application, the spectrum of is estimated using the method of moments, and in the other application, we test the hypothesis that there are no more than distinct eigenvalues in the spectrum of . In Section5, we develop two applications of the general theory of Section 3 to robust statistics in the high-dimensional context. Technical proofs of the main theorems are gathered in Section 6. Some useful lemmas and their proofs are postponed to the last section.

2 Preliminaries

The family of elliptical distributions is an important extension to the multivariate normal distribution and has been broadly used in data analysis in various fields

(Gupta et al., 2013). A random vector with zero mean is elliptically distributed if it has a stochastic representation:


where is a

deterministic and invertible matrix,

a scalar random variable representing the scale of

, and is the random direction, independent of

and uniformly distributed on the unit sphere in

. Besides the normal distribution, this family includes many other celebrated distributions, such as multivariate -distribution, Kotz-type distributions, and Gaussian scale mixtures. Clearly the population covariance matrix is where is in fact the shape matrix of the population. In order to resolve the indeterminacy between the scales of and , we will use throughout the paper the normalization .

Let be a sequence of independent and identically distributed (i.i.d.) random vectors from the elliptical population (2.1). We consider the sample SSCM defined in (1.2) and scale it as


The reason for this scaling is that now and the eigenvalues of are of order in average.

In this paper, using tools of random matrix theory, we investigate limiting properties of the eigenvalues of in a high-dimensional setting. Precisely, both the dimension and the sample size tend to infinity with their ratio , a positive constant in .

Let be a matrix with eigenvalues . Its empirical spectral distribution

(ESD) is by definition the probability measure

where denotes the Dirac mass at . If this sequence has a limit when , the limit is referred as a limiting spectral distribution, or LSD, of the sequence. Our aim is to study the limiting properties of and CLT for linear spectral statistics (LSS) of the form for a class of smooth test functions . These properties may become powerful tools to recover spectral features of the scaled population SSCM, i.e. , and then those of the shape matrix since the matrices and share the same eigenvectors and their eigenvalues have a one-to-one correspondence (Boente and Fraiman, 1999). Moreover, as , the two matrices coincide in the sense that the spectral norm , as long as (or ) is uniformly bounded, see Lemma 7.1.

Spectral properties of a standard high-dimensional SCM have been extensively studied in random matrix theory since the pioneer work of Marčenko and Pastur (1967). The standard model in this literature has the form


where is as before, is a constant, and is a set of i.i.d. random variables satisfying E, E, and E. Let be i.i.d. copies of and be the corresponding SCM. It is well known that the ESD of converges to the celebrated Marčenko-Pastur (MP) law when , and to a generalized MP law for general matrix , as with (Marčenko and Pastur, 1967; Silverstein, 1995). The CLT for LSS of was first studied in Jonsson (1982) by assuming the population is a standard multivariate normal. One breakthrough on the CLT was obtained by Bai and Silverstein (2004), where the population is allowed to be general with E. This fourth moment condition was then weakened to E in Pan and Zhou (2008). For more references, one can refer to Bai and Silverstein (2010), Bai et al. (2015), Gao et al. (2016), and references therein. However, these results do not apply to general elliptical populations since the two models in (2.1) and (2.3) have little in common, except for normal populations. In fact, for general elliptical populations, it has been reported that the ESD of the SCM converges to a deterministic distribution that is not a generalized MP law, but has to be characterized by both the distribution of and the limiting spectrum of through a system of implicit equations (El Karoui, 2009; Li and Yao, 2018). The involvement of seriously interferes with our understanding of the spectrum of from the ESD of . This again motivates us to shift our attention to the SSCM which discards the random radius and focus only on the directions .

3 High-dimensional theory for eigenvalues of

3.1 Limiting spectral distribution of

In this section we derive a LSD for the sequence under the assumptions below.

Assumption (a).  Both the sample size and population dimension tend to infinity in such a way that and .

Assumption (b).  Sample observations are , where is a deterministic and invertible matrix with and are i.i.d. random vectors, uniformly distributed on the unit sphere in .

Assumption (c).  The spectral norm of is bounded and its spectral distribution

converges weakly to a probability distribution

, called population spectral distribution (PSD). Moreover, the spectral moments of are denoted by and their limits by .

From Lemma 7.1, it is clear that the spectral distributions of and are asymptotically identical. So one can certainly replace with in Assumption (c), which does not affect the LSD of . However we keep because it is easy to describe the CLT for LSS using the spectral distribution of .

For the characterization of the LSD of , we need to introduce the Stieltjes transform of a measure on the real line, which is defined as

where .

Theorem 3.1.

Suppose that Assumptions (a)-(c) hold. Then, almost surely, the empirical spectral distribution converges weakly to a probability distribution , whose Stieltjes transform is the unique solution to the equation


in the set .

The LSD defined in (3.1) is a generalized MP law already appeared in the seminal paper Marčenko and Pastur (1967). Let denote the Stieltjes transform of . Then (3.1) can also be represented as (Silverstein, 1995)


For procedures on finding the density function and the support set of from (3.1) and (3.2), one is referred to Bai and Silverstein (2010).

3.2 CLT for linear spectral statistics of

Let be the LSD as defined in (3.1) with the parameters replaced by . Let . We now study the fluctuation of so-called LSS of the form

where is some given measurable function. Define also the interval

Theorem 3.2.

Suppose that Assumptions (a)-(c) hold. Let be functions analytic on an open set that includes the interval (3.3). Then the random vector converges weakly to a Gaussian vector with mean function

and covariance function

Here and the contours and are non-overlapping, closed, counter-clockwise orientated in the complex plane and enclosing the support of the LSD .

A special case of interest is a multivariate normal population that satisfies both the elliptical model (2.1

) and the linear transformation model (

2.3). In this case, it is interesting to compare the limiting distribution in Theorem 3.2 based on the sample SSCM with the classical CLT in Bai and Silverstein (2004) based on the SCM . One finds that some additional and new terms appear in Theorem 3.2, namely the second contour integral in the mean function and the second to fourth terms in the covariance function above do not exist in the classical CLT in Bai and Silverstein (2004).

Another closely related work is Hu et al. (2019), where the authors study elliptical population by assuming being independent of . Though sharing the same form, our model violates their independent assumption. Specifically, we takes which is correlated with . It will be shown that such correlation is not (asymptotically) negligible for the distribution of LSS.

3.3 Asymptotic distributions of spectral moments

Among all LSS, the following spectral moments of are particularly important:

The first moment is 1 since . All other moments , , are random. Define the moments of the related MP laws

From Nica and Speicher (2006), the quantities and (moments of ) are related through the recursive formulae:


and , where the sum runs over the following partitions of :

and The joint limiting distribution of moments can be derived from Theorem 3.2 by considering the moment functions . For this particular case, the mean and covariance functions in the limiting distribution can be explicitly calculated.

Corollary 3.1.

Suppose that Assumptions (a)-(c) hold. Then the random vector

The mean vector is given by

where , , and denotes the th derivative of with respect to . The covariance matrix has entries

where .

4 Applications to spectral inference

A natural question on spatial signs is how to infer the population SSCM from the sample SSCM when the dimension is large. If the question was for the pair of population and sample covariance matrices , this falls in the widely studied problem of estimating a large covariance matrix. Noting the fundamental difference between an SSCM and a standard covariance matrix, we indeed found nothing in the literature for properties of a high-dimensional SSCM. (to our best knowledge).

In this section, we consider a scenario where the PSD of can be modeled as a finite mixture of point masses. Using the theory of Section 3, we propose two new inference tools for the PSD . First an asymptotic normal estimator is found for such a finite-mixture PSD . This estimator is particularly interesting for an elliptical population because the eigenvalues of and are then in a well-known one-to-one correspondence. This will finally lead to a robust estimator for much better than some existing proposals, for example, the estimator from the procedure (C1)-(C2)-(C3) of Visuri et al. (2000). The second inference tool we develop treats the question of determination of the order of the finite mixture in .

Precisely, the family of PSDs under study is a class of parameterized discrete distributions with finite support on , that is,


where Here the restriction is due to the normalization condition . Note that the model (4.1) depends on an integer parameter , referred as the order of . Such finite mixtures have already been employed for the standard large covariance matrix , see El Karoui (2008), Rao et al. (2008), Bai et al. (2010) and Li and Yao (2014). Similar to El Karoui (2008), we adopt the setting of fixed PSDs in this section, i.e. for all large.

4.1 Estimation of a PSD

For the model in (4.1), we follow the moment method in Bai et al. (2010) for the PSD estimation. Given a known order , the method first estimates the moments of through the recursive formulae in (3.4), and then solve a system of moment equations, to get a consistent estimator of .

In our situation, with notation and for , we denote

as the mappings between the corresponding vectors. These mappings are all one-to-one and the determinants of their Jacobian matrices are all nonzero, see Bai et al. (2010). Therefore, applying Theorem 3.1, which implies that , as . However, as shown by the CLT in Corollary 3.1, the estimator has a bias of the order . So it’s natural to modify by subtracting its limiting mean in the CLT to obtain a better estimator of . Beyond this correction, the CLT can also provide confidence regions for the parameter .

Denote the modified estimators of , , and by


respectively, where with defined in Corollary 3.1 for From Theorem 3.1, Corollary 3.1, and a standard application of the Delta method, one may easily get asymptotic properties of these estimators.

Theorem 4.1.

Suppose that Assumptions (a)-(c) hold and the true value is an inner point of . Then we have , , , and moreover


where and represent the Jacobian matrices and , respectively, and is defined in Corollary 3.1 with .

4.2 Test for the order of a PSD

The aforementioned estimation procedure requires that the order of the PSD be pre-specified. In general, this prior knowledge should be testified in advance. To deal with this problem, we consider the hypotheses


where is a given positive integer. These hypotheses can also be regarded as a generalization of the well-known sphericity hypotheses on covariance matrices, i.e. the case .

In Qin and Li (2016), a test procedure was outlined based on a moment matrix and its estimator which can be formulated as

Here we set and , as defined in (4.2), for . It has been proved that the determinant of is zero if the null hypothesis in (4.4) holds, otherwise is strictly positive (Li and Yao, 2014). Therefore, the determinant can serve as a test statistic for (4.4) and the null hypothesis shall be rejected if the statistic is large. Applying Theorem 4.1 and the main theorem in Qin and Li (2016), the asymptotic distribution of is obtained immediately.

Theorem 4.2.

Suppose that Assumptions (a)-(c) hold. Then the statistic is asymptotically normal, i.e.


where with , the vectorization of the adjoint matrix of . The first two rows and columns of the matrix consist of zero and the remaining sub-matrix is defined in (4.3). The matrix is a 0-1 matrix with only , , , where denotes the greatest integer not exceeding .

From Theorem 4.1, the limiting variance in (4.5) is a continuous function of . While, under the null hypothesis, this variance is a function of , denoted by . Let . Then it is a strongly consistent estimator of .

Corollary 4.1.

Suppose that Assumptions (a)-(c) hold. Then, under the null hypothesis,

as . In addition, the asymptotic power of tends to 1.

Corollary 4.1 follows directly from Theorem 4.2 and its proof is thus omitted. This corollary includes as a particular case the sphericity test. For this case, the test statistic reduces to and its null distribution is consistent with that in Paindaveine and Verdebout (2016) which is obtained by a direct and completely different method.

4.3 Simulation experiments

Simulations are carried out to evaluate the performance of the proposed estimation and test for discrete PSDs in (4.1). Samples are drawn from and all empirical statistics are calculated from 10,000 independent replications.

The estimation procedure is tested for the following two PSDs.

  • Model 1: and ;

  • Model 2: and .

The sample sizes are for Model 1 and

for Model 2, respectively. In addition to empirical means and standard deviations of all estimators, we also calculate 95% confidence intervals for all parameters and report their coverage probabilities. Results are collected in Tables

1 and 2. The consistency of all estimators is clearly demonstrated.

Mean St. D. C. P. Mean St. D. C. P. Mean St. D. C. P.
0.4839 0.1145 0.9375 0.4960 0.0550 0.9491 0.5000 0.0269 0.9486
0.4915 0.1135 0.9137 0.4968 0.0588 0.9423 0.4997 0.0292 0.9488
1.5030 0.1330 0.9288 1.4990 0.0668 0.9426 1.4998 0.0329 0.9487
0.5085 0.1135 0.9137 0.5032 0.0588 0.9423 0.5003 0.0292 0.9488
Table 1: Estimation for Model 1 with sample size = 100,200,400 and . The number of independent replications is 10,000 and the nominal coverage probability (C. P.) is fixed at .
Mean St. D. C. P. Mean St. D. C. P. Mean St. D. C. P.
0.1887 0.0429 0.9227 0.1988 0.0147 0.9358 0.2003 0.0071 0.9367
0.2824 0.0447 0.9403 0.2956 0.0184 0.9525 0.2990 0.0090 0.9483
0.9960 0.1347 0.9345 0.9924 0.0661 0.9486 0.9991 0.0337 0.9433
0.4064 0.0373 0.9453 0.4012 0.0209 0.9239 0.4002 0.0110 0.9351
1.7824 0.0856 0.9236 1.7919 0.0440 0.9413 1.7960 0.0227 0.9392
0.3113 0.0696 0.9221 0.3031 0.0365 0.9429 0.3008 0.0189 0.9420
Table 2: Estimation for Model 2 with sample size = 400,800,1600 and . The number of independent replications is 10,000 and the nominal coverage probability (C. P.) is fixed at .

Next we examine the test procedure for the order of a PSD. The following two models are employed in this experiment:

  • Model 3: ;

  • Model 4: .

Here the parameter represents the distance between the null and alternative hypotheses. In particular, Model 3 is used for testing (sphericity test) with ranging from 0 to 0.2 by step 0.18 and Model 4 is for testing with ranging from 0 to 0.45 by step 0.05. The sample size is , the dimension-sample size ratios are , and the significance level is fixed at . Results summarized in Table 3 show that the proposed test has accurate empirical size and its power tends to 1 as the parameter increases under the two models.

under Model 3
0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
5.24 5.81 9.13 17.91 34.86 62.30 87.31 98.01 99.90 100
5.33 5.92 8.43 18.09 35.62 63.12 88.14 98.69 99.96 100
4.76 6.39 9.69 17.39 35.23 63.57 88.15 98.67 99.97 100
under Model 4
0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
4.75 7.19 17.49 43.96 79.28 97.06 99.87 100 100 100
5.05 6.31 12.22 26.78 53.74 80.74 95.07 99.52 99.97 100
4.88 5.65 8.56 16.33 30.09 49.17 71.60 86.54 95.20 98.61
Table 3: Empirical size and power of in percentage under Model 3 and Model 4 with the sample size . The number of independent replications is 10,000 and the nominal significance level is .

5 Application to robust statistics

In this section we develop a few applications of the general theory of Section 3 to robust statistics using the sample SSCM .

5.1 Robustness

We examine the robustness of several estimators for the shape matrix when sample data include outliers. Four estimators derived from and are considered in this comparison. Let be the spectral decomposition of and :

where the ’s are diagonal matrices of eigenvalues, sorted in ascending order, and the ’s are matrices of corresponding eigenvectors, respectively. In addition, we define a regularization function as

for any matrix with non-zero trace. Obviously, this function normalizes such that . With the above notations, the four estimators of we examine are as follows:

  • Regularized SCM ,

  • Spectrum-corrected SCM ,

    where is a collection of ascendingly sorted estimators of population eigenvalues using a moment method developed in Li and Yao (2014);

  • Robust sample SSCM constructed from the procedures (C1)-(C2)-(C3) of Visuri et al. (2000),

    where with being the square of the MAD of the th row of for .

  • Spectrum-corrected sample SSCM ,

    where the correction is obtained following three steps:

    • Step 1: Estimate the PSD of from the ESD through the procedure in Section 4.1 to get, say, ;

    • Step 2: Estimate the eigenvalues of from using the correspondence between the eigenvalues of and as given in Lemma 7.1;

    • Step 3: Sort the obtained estimates of the eigenvalues in ascending order to obtain .

The performance of the four estimators are tested under two models below.

Model 1:

Contaminated normal distribution of elliptical form:

where the population shape matrix is a diagonal matrix,

This model implies there are about 100% outlying observations with large amplitude. The mixing parameter takes two values (uncontaminated) and (contaminated by 1% outliers).

Model 2:

Contaminated normal distribution of non-elliptical form:

where the population shape matrix is the same as in Model 1 and the mixing parameter takes values and . For outliers, their shape matrix is

The population dimension is , and the sample size is . All statistics are averaged from 1000 independent replications. The number of outliers is fixed at for a given .

For each estimator , we calculate three distances from to its target matrix , including the Frobenius distance, the Kullback-Leibler (KL) distance and the spectral distance. The KL distance is not applicable to and for cases with because their determinants are thus zero. Figures 1, 2 and 3 summarize the results. They show that, when there is no outlier, and are comparable and both suffers from large biases caused by the high-dimensional effect. Such bias can be much alleviated by means of spectral correction as demonstrated by and which have almost the same accuracy. Note that the remaining biais is clearly present and it is a pity that there is no effective way at present to remove this remaining bias entirely. Therefore, the performance of and with can serve as a benchmark for the four estimators when comparing their robustness against outliers. As shown in the figures, in the presence of outliers, the estimator is more robust than , but both of them are still heavily biased for large . This is explained by the fact that both of them are not adapted to high dimensions. For the two other estimators and with high-dimensional correction, the estimator