    On the variability of the sample covariance matrix under complex elliptical distributions

We derive the variance-covariance matrix of the sample covariance matrix (SCM) as well as its theoretical mean squared error (MSE) when sampling from complex elliptical distributions with finite fourth-order moments. We also derive the form of the variance-covariance matrix for any affine equivariant matrix-valued statistics. Finally, illustrative examples of the formulas are presented.

Authors

09/03/2021

Regularized tapered sample covariance matrix

Covariance matrix tapers have a long history in signal processing and re...
08/13/2020

Linear pooling of sample covariance matrices

We consider covariance matrix estimation in a setting, where there are m...
10/02/2018

On characterizations of the covariance matrix

The covariance matrix is well-known for its following properties: affine...
10/15/2018

Population Symbolic Covariance Matrices for Interval Data

Symbolic Data Analysis (SDA) is a relatively new field of statistics tha...
01/29/2019

Statistical inference of probabilistic origin-destination demand using day-to-day traffic data

Recent transportation network studies on uncertainty and reliability cal...
04/14/2020

Extensions of Random Orthogonal Matrix Simulation for Targetting Kollo Skewness

Modelling multivariate systems is important for many applications in eng...
06/18/2019

Curvature effects on the empirical mean in Riemannian and affine Manifolds: a non-asymptotic high concentration expansion in the small-sample regime

The asymptotic concentration of the Fréchet mean of IID random variables...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Suppose we observe independent and identically distributed (i.i.d.) complex-valued

-variate random vectors

with mean and positive definite covariance matrix

. The (unbiased) estimators of

and are the sample covariance matrix (SCM) and the sample mean defined by

 S=1n−1n∑i=1(xi−¯x)(xi−¯x)H and ¯x=1nn∑i=1xi. (1)

The SCM is an integral part of many statistical signal processing methods such as adaptive filtering (Wiener and Kalman filters), spectral estimation and array processing (MUSIC algorithm, Capon beamformer)

In signal processing applications, a typical assumption would be to assume that the data follow a (circular) complex multivariate normal (MVN) distribution , denoted by . However, a more general assumption would be to assume a Complex Elliptically Symmetric (CES) [7, 8] distribution, which is a family of distributions including the MVN distribution as well as heavier-tailed distributions such as the -, -, and the inverse Gaussian distribution that are commonly used in radar and array signal processing applications as special cases [9, 10, 8, 11].

In the paper, we study the complex-valued (unbiased) SCM for which we derive the variance-covariance matrix as well as the theoretical mean squared error (MSE) when sampling from CES distributions. We also provide a general expression for the variance-covariance matrix of any affine equivariant matrix-valued statistic (of which the SCM is a particular case). The results regarding the SCM extend the results in  to the complex-valued case, where the variance-covariance matrix and MSE of the SCM was derived for real-valued elliptical distributions.

The structure of the paper is as follows. Section II introduces CES distributions. In Section III, we derive the variance-covariance matrix of any affine equivariant matrix-valued statistic when sampling from a CES distribution. In Section IV, we derive the variance-covariance matrix of the SCM. All proofs are kept in the appendix.

Notation

: The identity matrix is denoted by

and the vector of ones is denoted by . The Euclidean basis vector whose th coordinate is one and other coordinates are zero is denoted by . The notations , , and , denote the complex conjugate, the transpose, and the conjugate transpose, respectively. The notations , , and denote the sets of Hermitian, Hermitian positive semidefinite, and Hermitian positive definite

dimensional matrices, respectively. For a random matrix

, we use the shorthand notation and (see Section III for the definition of ), where is a vectorization of . When there is a possibility for confusion, we denote by or the covariance and expectation of a sample from an elliptical distribution with mean vector and covariance matrix . The commutation matrix  is defined by , where is the Kronecker product. We frequently use the identities: , , and , where , , , and are matrices of appropriate dimensions. The notation reads “has the same distribution as”. The notation

denotes the uniform distribution on the complex unit sphere

. Lastly, .

Ii Complex elliptically symmetric distributions

A random vector is said to have a circular CES distribution if and only if it admits the stochastic representation

 xd=μ+rΣ1/2u, (2)

where is the mean vector, is the unique Hermitian positive definite square-root of , , and

is a positive random variable called the

modular variate. Furthermore and

are independent. If the cumulative distribution function of

is absolutely continuous, the probability density function exists and is up to a constant of the form

 |Σ|−1g((x−μ)HΣ−1(x−μ)), (3)

where is the density generator. We denote this case by . We assume that has finite fourth-order moments, and thus we can assume without any loss of generality that is equal to the covariance matrix  . This implies that verifies , where . As a consequence of circularity, for , we have with and . Consequently,

is skew-symmetric. We refer the reader to

 for a comprehensive account on CES distributions.

The elliptical kurtosis of a CES distribution is defined as

 κ=E[r4]p(p+1)−1. (4)

Elliptical kurtosis shares properties similar to the kurtosis of a circular complex random variable. Specifically, if , then . This follows by noticing that , and hence and consequently in the Gaussian case. The kurtosis of a complex circularly symmetric random variable is defined as

 kurt(x)=E[|x−μ|4](E[|x−μ|2])2−2, (5)

where . Similar to the real-valued case, has a simple relationship with the (excess) kurtosis [14, Lemma 3]: for any . We note that the lower bound for the elliptical kurtosis is  .

Lastly, we define the scale and sphericity parameters

 η=tr(Σ)pandγ=ptr(Σ2)tr(Σ)2. (6)

The scale is equal to the mean of the eigenvalues. The sphericity measures how close the covariance matrix is to a scaled identity matrix. The sphericity parameter gets the value

for the scaled identity matrix and for a rank one matrix.

Iii Radial distributions and covariance matrix estimates

In this section, we derive the variance-covariance matrix of any affine equivariant matrix-valued statistic.

We begin with some definitions. The covariance and pseudo-covariance  of complex random vectors and are defined as

 cov(x1,x2) =E[(x1−E[x1])(x2−E[x2])H] and pcov(x1,x2) =E[(x1−E[x1])(x2−E[x2])⊤],

and together they provide complete second-order description of associations between and . Then and are called the covariance matrix and the pseudo-covariance matrix  of .

A random Hermitian () matrix is said to have a radial distribution if for all unitary matrices (so ). The following result extends the result of  to the complex-valued case.

Theorem 1.

Let a random matrix have a radial distribution with finite second-order moments. Then, there exist real-valued constants and with and such that with and

 var(A) =τ1I+τ2vec(I)vec(I)⊤, (7) pvar(A) =τ1Kp,p+τ2vec(I)vec(I)⊤, (8)

where and for all .

A statistic based on an data matrix of observations on complex-valued variables is said to be affine equivariant if

 ^Σ(XA⊤+1a⊤)=A^Σ(X)AH (9)

holds for all and . Suppose that is a random sample from a CES distribution and that is an affine equivariant statistic. Then has a stochastic decomposition

 ^Σ(X)d=Σ1/2⋅^Σ(Z)⋅Σ1/2, (10)

where denotes the value of based on a random sample from a spherical distribution . Affine equivariance together with the fact that for all unitary matrices indicate that has a radial distribution. This leads to Theorem 2 stated below.

Theorem 2.

Let be an affine equivariant statistic with finite second-order moments, and based on a random sample from a CES distribution . Then with , and

 var(^Σ) =τ1(Σ∗⊗Σ)+τ2vec(Σ)vec(Σ)H, (11) pvar(^Σ) =τ1(Σ∗⊗Σ)Kp,p+τ2vec(Σ)vec(Σ)⊤, (12)

where and .

There are many statistics for which this theorem applies. Naturally, a prominent example is the SCM, which we examine in detail in the next section. Other examples are the weighted sample covariance matrices

 R=1nn∑i=1u(di)(xi−¯x)(xi−¯x)H,

where and . For instance, these include the complex -estimates of scatter discussed in . In the special case, when , we obtain the fourth moment matrix, which is used in the FOBI (fourth-order blind identification) method  for blind source separation, and in Invariant Coordinate Selection (ICS) .

Iv Variance-covariance of the SCM

We now use Theorem 2 to derive the covariance matrix and the pseudo-covariance matrix as well as the MSE of the SCM when sampling from a CES distribution. This result extends [12, Theorem 2 and Lemma 1] to the complex case.

Theorem 3.

Let the SCM be computed on an i.i.d. random sample from a CES distribution with finite fourth-order moments and covariance matrix . Then, the covariance matrix and pseudo-covariance matrix of are as stated in (11) and (12) with

 τ1 =var0,I(s12)=1n−1+κn, τ2 =cov0,I(s11,s22)=κn,

where is the elliptical kurtosis in (4). The MSE is given by

 MSE(S) =E[∥S−Σ∥2F]=(1n−1+κn)tr(Σ)2+κntr(Σ2),

and the normalized MSE is

 NMSE(S)=MSE(S)∥Σ∥2F=pγ(1n−1+κn)+κn, (13)

where is the sphericity parameter defined in (6).

Consider the simple shrinkage covariance matrix estimation problem,

 βo=argminβ∈R E[∥βS−Σ∥2F].

Since the problem is convex, we can find as solution of which yields

 βo =∥Σ∥2FMSE(S)+∥Σ∥2F=1NMSE(S)+1, (14)

where we used . As can be noted from (14), the optimal scaling term is always smaller than 1 since . Note that, is a function of and via (13). Next we show that the oracle estimator is uniformly more efficient than the SCM, i.e., for any . First note that

 E[∥βoS−Σ∥2F]=β2oMSE(S)+(1−βo)2∥Σ∥2F. (15)

Then from (14) we notice that . Subsituting this into (15) we get

 MSE(So) =β2oMSE(S)+β2oNMSE(S)2∥Σ∥2F =β2oMSE(S)(1+NMSE(S))=βoMSE(S),

where the last identity follows from fact that due to (14). Since for all , it follows that is more efficient than . Efficiency in the case when and , and hence need to be estimated, remains (to the best of our knowledge) an open problem.

Consider the univariate case (), so that is equal to the variance of the random variable and the SCM reduces to the sample variance defined by In this case, , and the optimal scaling constant in (14) becomes

 β0 =n(n−1)kurt(x)(n−1)+n2.

A similar result was noticed in 

for the real-valued case. If the data is from a complex normal distribution

), then , and , and hence , which equals the Maximum Likelihood Estimate (MLE) of . In the real case, the optimal scaling constant is for Gaussian samples . Note that when the kurtosis is large and positive and is small, the can be substantially less than one and the gain of using can be significant.

V Conclusion

We derived the variance-covariance matrix and the theoretical mean squared error of the sample covariance matrix when sampling from complex elliptical distributions with finite fourth-order moments. We also derived the form of the variance-covariance matrix for any affine equivariant matrix-valued statistics. We presented illustrative examples of the formulas in the context of shrinkage covariance estimation.

-a Proof of Theorem 1

The proof follows the same lines as the proof in  for the real-valued case.

is obvious. For any unitary matrix

, we have

 var(A) =var(vec(A))d=var(vec(QAQH)) =var((Q∗⊗Q)vec(A)) =(Q∗⊗Q)var(vec(A))(Q⊤⊗QH).

Let be a basis for the set of matrices. Then

 var(A)=∑i,j,k,lτijkleie⊤j⊗eke⊤ld=∑i,j,k,lτijklq∗iq⊤j⊗qkqHl,

where . By choosing (where is the imaginary unit) and for some , we must have unless , , or . Denote , and . Then

 var(A) =∑i,jτ1eie⊤i⊗eje⊤j+∑i,jτ2eie⊤j⊗eie⊤j +(τ0−τ1−τ2)∑ieie⊤i⊗eie⊤i.

Note that and . Furthermore,

 (Q∗⊗Q)∑i,jeie⊤i⊗eje⊤j(Q⊤⊗QH) =I (Q∗⊗Q)∑i,jeie⊤j⊗eie⊤j(Q⊤⊗QH) =vec(I)vec(I)⊤ (Q∗⊗Q)∑ieie⊤i⊗eie⊤i(Q⊤⊗QH) ≠eie⊤j⊗eie⊤j.

From the last inequality, we must have and follows.

Regarding the pseudo-covariance, for any unitary ,

 pvar(A)

which implies

 pvar(A)=∑i,j,k,lτ′ijkleie⊤j⊗eke⊤ld=∑i,j,k,lτ′ijklq∗iqHj⊗qkq⊤l,

where . By choosing and for some , we must have except when , , or . Let , and . Then,

 pvar(A) =∑i,jτ′1eie⊤j⊗eje⊤i+∑i,jτ′2eie⊤j⊗eie⊤j +(τ′0−τ′1−τ′2)∑ieie⊤i⊗eie⊤i =τ′1Kp,p+τ′2vec(I)vec(I)⊤

by similar arguments as with . Then note that, and . Lastly, since is positive semidefinite and ,

 |var(A)|=|τ1I+τ2vec(I)vec(I)⊤|=(τ1+τ2p)τp2−11≥0

implies .

-B Proof of Theorem 2

Since has a radial distribution, it follows from (10) that

 var(^Σ(X))d=var(Σ1/2^Σ(Z)Σ1/2) =((Σ1/2)∗⊗Σ1/2)var(^Σ(Z))((Σ1/2)∗⊗Σ1/2).

From Theorem 1, is of the form (7). Since

 ((Σ1/2)∗⊗Σ1/2)I((Σ1/2)∗⊗Σ1/2)=(Σ∗⊗Σ) and ((Σ1/2)∗⊗Σ1/2)vec(I)vec(I)⊤((Σ1/2)∗⊗Σ1/2) =vec(Σ)vec(Σ)H,

we obtain (11). Similarly,

 pvar(^Σ(X))d=pvar(Σ1/2^Σ(Z)Σ1/2) =((Σ1/2)∗⊗Σ1/2)pvar(^Σ(Z))(Σ1/2⊗(Σ1/2)∗),

where is of the form (8). Since

 ((Σ1/2)∗⊗Σ1/2)Kp,p(Σ1/2⊗(Σ1/2)∗)=(Σ∗⊗Σ)Kp,p and ((Σ1/2)∗⊗Σ1/2)vec(I)vec(I)⊤(Σ1/2⊗(Σ1/2)∗) =vec(Σ)vec(Σ)⊤,

we obtain (12). ∎

-C Proof of Theorem 3

The proof here is similar to the proof of [12, Theorem 2] that was derived for real-valued observations. First we recall that the SCM has representation , where is the centering matrix.

Write and for . Then note that . Hence,

 τ1 =var(sqr)=var((n−1)−1a⊤Hb∗) =(n−1)−2var(a⊤Hb∗). (16)

Then note that

 var(a⊤Hb∗) =var(tr(Hb∗a⊤)) (17) =vec(H)⊤var(vec(b∗a⊤))vec(H). (18)

Recall that has a stochastic representation , where is independent of . Thus we can write and similarly for . The th element of the th block (i.e., the th element) of the matrix is then

 cov(b∗kai,b∗laj) =E[rkrirlrju∗kruiqulru∗jq],

where we used that . Then note that

 E[|uiq|2|uir|2]=1p(p+1), E[|uiq|2]=1p, and E[|uiq|4]=2p(p+1),

while all other moments up to fourth-order vanish. This and the fact that due to (4), allows us to conclude that the only non-zero elements of are

 E[r4i]E[|uir|2|uiq|2] =1+κ for i =j=k=l, and E[r2i]E[r2k]E[u2ir]E[u2kq] =1 for i =j≠k=l,

and hence

 var(vec(b∗a⊤))=I+κn∑i=1eie⊤i⊗eie⊤i. (19)

This together with (-C) and (18) yields

 τ1 =1(n−1)2vec(H)⊤(I+κn∑i=1eie⊤i⊗eie⊤i)vec(H) =1n−1+κn,

where we used and

 n∑i=1vec(H)⊤(eie⊤i⊗eie⊤i)ve