Covariance Matrix Estimation from Correlated Samples

Covariance matrix estimation concerns the problem of estimating the covariance matrix from a collection of samples, which is of extreme importance in many applications. Classical results have shown that O(n) samples are sufficient to accurately estimate the covariance matrix from n-dimensional independent Gaussian samples. However, in many practical applications, the received signal samples might be correlated, which makes the classical analysis inapplicable. In this paper, we develop a non-asymptotic analysis for the covariance matrix estimation from correlated Gaussian samples. Our theoretical results show that the error bounds are determined by the signal dimension n, the sample size m, and the shape parameter of the distribution of the correlated sample covariance matrix. Particularly, when the shape parameter is a class of Toeplitz matrices (which is of great practical interest), O(n) samples are also sufficient to faithfully estimate the covariance matrix from correlated samples. Simulations are provided to verify the correctness of the theoretical results.

Authors

• 23 publications
• 57 publications
• 13 publications
10/16/2019

Covariance Matrix Estimation from Correlated Sub-Gaussian Samples

This paper studies the problem of estimating a covariance matrix from co...
03/31/2018

Improving Portfolios Global Performance with Robust Covariance Matrix Estimation: Application to the Maximum Variety Portfolio

This paper presents how the most recent improvements made on covariance ...
03/15/2019

Joint Mean-Covariance Estimation via the Horseshoe with an Application in Genomic Data Analysis

Seemingly unrelated regression is a natural framework for regressing mul...
12/14/2021

Euclid: Covariance of weak lensing pseudo-C_ℓ estimates. Calculation, comparison to simulations, and dependence on survey geometry

An accurate covariance matrix is essential for obtaining reliable cosmol...
01/29/2019

Blind Unwrapping of Modulo Reduced Gaussian Vectors: Recovering MSBs from LSBs

We consider the problem of recovering n i.i.d samples from a zero mean m...
05/04/2018

Estimating Learnability in the Sublinear Data Regime

We consider the problem of estimating how well a model class is capable ...
01/17/2019

Spatial localization for nonlinear dynamical stochastic models for excitable media

Nonlinear dynamical stochastic models are ubiquitous in different areas....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Estimating covariance matrices becomes fundamental problems in modern multivariate analysis, which finds applications in many fields, ranging from signal processing

[1][2] to statistics [3] and finance [4]. In particular, important examples in signal processing include Capon’s estimator [5], MUltiple SIgnal Classification (MUSIC) [6], Estimation of Signal Parameter via Rotation Invariance Techniques (ESPRIT) [7], and their variants [1].

During the past few decades, there have been numerous works devoted to studying the optimal sample size that suffices to estimate the covariance matrix from -dimensional independent samples [8, 9, 10, 11, 12, 13, 14, 15]. For instance, Vershynin [10] has shown that samples are sufficient for independent sub-Gaussian samples, where denotes that the order of growth of the samples is a linear function of the dimension ; Vershynin [13] also illustrates that samples are required for independent heavy tailed samples; and Srivastava et al. [14] have established that is the optimal bound for independent samples which obey log-concave distributions.

However, in many practical applications of interest, it is very hard to ensure that the received signal samples are independent of each other. For example, in signal processing, the signal sources might be in multipath channel [16, 17] or interfere with each other [18, 19], which causes the received samples correlated. In portfolio management and risk assessment, the returns between different assets are correlated on short time scales, i.e., the Epps effect [20, 21]. Then a natural question to ask is:

Is it possible to use correlated samples to estimate the covariance matrix? If possible, how many correlated samples do we need to obtain a good estimation of the covariance matrix?

This paper focuses on the above question and provides some related theoretical results. More precisely, we establish non-asymptotic error bounds for covariance matrix estimation from linearly-correlated Gaussian samples in both expectation and tail forms. These results show that the error bounds are determined by the signal dimension , the sample size , and the shape parameter of the distribution of the correlated sample covariance matrix. In particular, if the shape parameter is a class of Toeplitz matrices (see Section III.C Example 2 for details), where the shape parameter satisfies , , and , our results reveal that the correlated case has the same order of error rate as the independent case albeit with a larger multiplicative coefficient.

The remainder of this paper is organized as follows. The problem is formulated in Section II. The performance analysis of the covariance matrix estimation from linearly-correlated Gaussian samples is presented in Section III. Simulations are provided in Section IV, and conclusion is drawn in Section V.

Ii problem formulation

Let

be a centered Gaussian vector with the covariance matrix

, where is a positive definite matrix. Let be independent copies of . Suppose we observe linearly-correlated samples

 Y=XΛ,

where , , and is a fixed matrix. The objective is to estimate the covariance matrix from correlated samples . Here we assume that 111When , in order to estimate covariance matrices, we require some kinds of prior information about covariance matrices. During the past few decades, a large number of estimators have been proposed to solve the problems in this setting. For example, one group is structured estimators, which impose additional structures on covariance matrices (see [3] and references therein). Typical examples of structured covariance matrices include bandable covariance matrices [22], Toeplitz covariance matrices [23], sparse covariance matrices [24, 25] and so on. Another group is to shrinkage the sample covariance matrix to a “target” matrix by incorporating some regularization [26, 27, 28, 29]. The general form of shrinkage estimators is

where is the shrinkage “target” matrix with positive definite structure, denotes the sample covariance matrix, and

is an absolute constant. The third group of estimators is based on spectrum correction. In this group, spectrum correction approaches are utilized to infer a mapping from the sample eigenvalues to corrected eigenvalue estimates which yield a superior covariance matrix, see e.g.,

[30, 31, 32]. .

The standard approach under correlated samples utilizes the correlated sample covariance matrix to approximate the actual one (see, e.g., [33, 34, 35])

 ^Σ=1mm∑k=1ykyTk=1mXΛΛTXT:=1mXBXT. (1)

Our problem then becomes to investigate how many correlated samples are enough to estimate accurately from . It is not hard to find that the correlated sample covariance matrix is a compound Wishart matrix222Let be independent Gaussian vectors, and let be an arbitrary real matrix. We say that a random matrix is a compound Wishart matrix with shape parameter and scale parameter if , where . with shape parameter and scale parameter [36].

For the convenience of comparison, we restate a typical result for covariance matrix estimation from independent Gaussian samples as follows. This result indicates that samples are sufficient to estimate the covariance matrix accurately from independent Gaussian samples. It is natural to expect that we require at least samples to estimate the covariance matrix from correlated samples.

Proposition 1 (Theorem 4, [15]).

Let be a centered -dimensional Gaussian vector with the covariance matrix , and let be independent copies of . Then the sample covariance matrix satisfies

 E||˜Σ−Σ||≤C(√nm+nm)∥Σ∥,

where denotes the spectral norm and is an absolute constant.

Iii Covariance Matrix Estimation From Linearly-Correlated Gaussian Samples

In this section, we present our main results for the covariance matrix estimation from linearly-correlated Gaussian samples. Our proof strategy is divided into two steps. First, we establish a key theorem which illustrates that the correlated sample covariance matrix

 ^Σ=1mYYT=1mXBXT

concentrates around its mean

with high probability. We then establish the non-asymptotic error bounds for the estimated covariance matrix in both expectation and tail forms.

Iii-a Concentration of Linearly-Correlated Sample Covariance Matrix

Theorem 1.

Let be independent Gaussian vectors, where is an real positive definite matrix. Let be a fixed symmetric real matrix. Consider the compound Wishart matrix with . Then for any , the following event

holds with probability at least , where denotes the Frobenius norm. Furthermore,

 (2)
Proof.

See Appendix A. ∎

Remark 1.

It follows from Theorem 1 that the error bounds depend on the signal dimension , the sample size , and the shape parameter . In particular, if and , then this result reveals that samples are sufficient to estimate the compound Wishart matrix accurately.

Remark 2 (Symmetric B).

The fact that the shape matrix of the correlated sample covariance matrix is symmetric plays a key role in the proof of Theorem 1. This property enables the compound Wishart matrix to be expressed as a weighted sum of independent rank-one matrices. Thus we can employ standard techniques in [10] (e.g., -net method and Bernstein’s inequality) to establish the error bounds in both expectation and tail forms.

Remark 3 (General B).

For general , however, the compound Wishart matrix cannot be expressed as an independent weighted sum, which makes the theoretical analysis much harder.

In [37], Soloveychik closely follows a sophisticated strategy developed by Levina and Vershynin [38] and establishes the following expectation bound

 E∥W−EW∥≤24⌈log2n⌉2√n(4∥B∥+√π∥B∥F/∥B∥)m∥Σ∥.

It is not hard to see that if and , then this bound shows that samples are sufficient to estimate the compound Wishart matrix accurately.

In [39], Paulin et al. employ the method of exchangeable pairs [40, 41] and establish the concentration of in both expectation and tail forms for the bounded sample matrix (i.e., each entry of is bounded by an absolute positive constant ). The expectation bound in [39] is given by

 E||W−EW||≤2√v(B)logn+32√3Lnlogn||B||m,

where and

is the standard deviation of each entry of

. Clearly, if and , then this bound establishes that samples suffice to estimate the compound Wishart matrix .

In contrast to the above two works, our proof strategy is totally different from theirs. This is because we have exploited the symmetric structure of . More importantly, our results improve theirs in the symmetric case. This improvement is critical to obtain the optimal error rate for the covariance matrix estimation from correlated samples.

Remark 4.

It is worth pointing out that there is a different line of research which studies the asymptotic behavior of the compound Wishart matrix (e.g., or ). Please refer to [42] and references therein for a survey.

Iii-B Covariance Matrix Estimation from Linearly-Correlated Gaussian Samples

We then derive the error bounds for the covariance matrix estimation from linearly-correlated Gaussian samples.

Theorem 2.

Let be independent random vectors, where is an real positive definite matrix. Let . Consider the correlated samples , where is a fixed matrix. Let the sample covariance matrix . Then for any , the event

 ∥∥^Σ−Σ∥∥≤∣∣ ∣∣tr(ΛΛT)m−1∣∣ ∣∣||Σ||+32∥∥ΛΛT∥∥Fδ+64∥∥ΛΛT∥∥δ2m∥Σ∥

holds with probability at least . Furthermore,

Proof:

By the triangle inequality, we have

 E∥∥^Σ−Σ∥∥ =E∥∥∥1mYYT−Σ∥∥∥ ≤E∥∥∥1mYYT−1mE[YYT]∥∥∥ +∥∥∥1mE[YYT]−Σ∥∥∥. (3)

The first term in (III-B) can be easily bounded by Theorem 1, i.e.,

 (4)

It suffices to bound the second term in (III-B). Since the columns of are centered independent Gaussian vectors, direct calculation leads to

 E[(YYT)ij] =m∑l,k=1(ΛΛT)lkE(XilXjk) =m∑l=1(ΛΛT)llE(XilXjl) =tr(ΛΛT)Σij,

where denotes the -th entry of the matrix , . Thus we have

 (5)

In particular, if the shape matrix satisfy (see examples in Section III-C), then we have following corollary.

Corollary 1.

Let be independent random vectors, where is an real positive definite matrix. Let . Consider the correlated samples , where is a fixed matrix such that . Let the sample covariance matrix . Then for any , the event

 ∥∥^Σ−Σ∥∥≤32∥∥ΛΛT∥∥Fδ+64∥∥ΛΛT∥∥δ2m∥Σ∥

holds with probability at least . Furthermore,

 E∥∥^Σ−Σ∥∥≤72∥∥ΛΛT∥∥F√n+282∥∥ΛΛT∥∥nm∥Σ∥.

Iii-C Examples

In this subsection, we present some examples to illustrate our theoretical results.

Example 1 (Independent samples).

In this case, , where is the

-dimensional identity matrix. It is easy to verify that

, , and . It then follows from Corollary 1 that

 E∥∥^Σ−Σ∥∥≤(72√nm+282nm)∥Σ∥. (6)

It is clear that samples are sufficient to estimate the covariance matrix in this case. This result is consistent with Proposition 1.

Example 2 (Partially correlated samples).

In this case, a typical model for the shape parameter is that is a symmetric Toeplitz matrix

 ΛΛT=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1θ⋯θm−1θ1⋱⋮⋮⋱⋱θθm−1⋯θ1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦:=T(θ)

with . This model is very common in many applications. For instance, the lagged correlation between the returns in portfolio optimization [34] satisfies this model by setting , where is the characteristic time. Obviously, . By Gershgorin circle theorem [43, Theorem 7.2.1], we have

 ∥T(θ)∥≤1+2⋅∞∑k=1θk=1+2θ1−θ=1+θ1−θ, 0<θ<1.

 ||T(θ)||2F =m+2⋅m−1∑k=1(m−k)θ2k =m(1+θ2)1−θ2+2θ2(θ2m−1)(1−θ2)2≤m(1+θ2)1−θ2,

where that last inequality holds because . It also follows from Corollary 1 that

 E∥∥^Σ−Σ∥∥≤(72√1+θ21−θ2⋅nm+282⋅1+θ1−θ⋅nm)∥Σ∥. (7)

Therefore, we conclude that in this case, correlated samples are also sufficient to accurately estimate the covariance matrix. The difference between the correlated case and the independent case is that for a given estimation accuracy of , the former requires more samples than that of the latter. This is because the multiplicative coefficient in the error bound (7) is larger than that in (6). Furthermore, the larger the , the greater the multiplicative coefficient.

Example 3 (Totally correlated samples).

When the observed signal samples are totally correlated, for example, for , which means that is an all-one matrix

 ΛΛT=⎡⎢ ⎢⎣1⋯1⋮⋱⋮1⋯1⎤⎥ ⎥⎦:=Θ.

Standard calculation shows that , , and . By Corollary 1, we have

 E∥∥^Σ−Σ∥∥≤(72√n+282n)∥Σ∥.

This result indicates that when the samples are totally correlated, the error bounds are independent of the sample size , which means that increasing will not reduce the estimation error.

Iv Simulation Results

In this section, we carry out some simulations to demonstrate our theoretical results.

Consider an matrix

whose entries are independently drawn from the standard Gaussian distribution. Let

satisfy .

In the first experiment, we consider the case where the samples are independent but with time-variant scale factors, i.e., is a diagonal matrix with different diagonal entries. Let , where have independent Gaussian distribution with mean and standard deviation . We make simulations for four models: 1) ; 2) ; 3) ; 4) . It is not hard to verify the four models satisfy and . We set and increase from 1 to 30. For a fixed , we make trials and calculate the average of the minimum sample size which satisfies

 ||^Σ−Σ||F||Σ||F≤η.

Fig. 1 shows the simulation results. It is not hard to find that the sample size is proportional to the signal dimension for the four models. With the increase of the standard deviation, the slope of the line also increases. This phenomenon can be explained by Theorem 2: when the standard deviation increases, the average of both and will also increase, which leads to the increase of the slope.

In the second experiment, we consider the following four correlated models: 1) ; 2) ; 3) ; 4) . Let increase from to and . We also make 500 Monte Carlo trials and calculate the average sample size for each fixed like the first experiment.

Fig. 2 reports the simulation results. We can easily see that the number of samples in the four cases is a linear function of the signal dimension , which agrees with theoretical results (6) and (7). In addition, the larger the parameter , the bigger the slope, which demonstrates that for a given estimation accuracy of , the correlated case requires more samples than that of the less correlated one.

In the third experiment, we compare the divergence rate of theoretical results (Theorem 2) and Monte Carlo simulation results for four correlated models: 1) ; 2) ; 3) ; 4) . We set and increase from 50 to 1000 with step 50. For a fixed sample size , we make 500 Monte Carlo trials and calculate the logarithm (base 10) of the average of estimation error .

The results are presented in Fig. 3. From these results, we can know that for both theoretical and simulation results, the curves of four models are nearly parallel, which means that the four models have very similar error divergence rate. The results agree with Theorem 2. However, we also observe a big gap between theoretical estimation errors and simulation estimation errors. This is because we have made a number of loose estimates in order to obtain a clear statement of the proofs, which leads to the fact that our theoretical bounds are not optimal in terms of multiplicative coefficients.

V conclusion

In this paper, we have presented a non-asymptotic analysis for the covariance matrix estimation from linearly-correlated Gaussian samples. Our theoretical results have shown that the error bounds depend on the signal dimension , the sample size , and the shape parameter of the distribution of the correlated sample covariance matrix. In particular, when the shape parameter is a class of Toeplitz matrix (which is of great practical interest), samples are sufficient to faithfully estimate the covariance matrix from correlated samples. This result has demonstrated that it is possible to estimate covariance matrices from moderate correlated samples.

For future work, it would be of great practical interest to extend the theoretical analysis for correlated samples from Gaussian distribution to other distributions such as sub-Gaussian, heavy tailed, and log-concave distributions. In addition, it is of great importance to investigate the performance of other estimators under correlated samples. Examples include structured estimators, regularized estimators, and so on.

Appendix A Proof of Theorem 1

To prove Theorem 1, we require some useful definitions and facts. Without loss of generality, we assume , otherwise we can use instead of to verify the general case.

Definition 1 (ε-net).

Let and . A subset is called an -net of if

 ∀ x∈K,  ∃ x0∈N such that ∥x−x0∥2≤ε.
Definition 2 (sub-Gaussian).

is said to be sub-Gaussian with variance proxy

(denoted as ) if and

Equivalent definitions of sub-Gaussian random variables can be found in [44, Proposition 2.5.2]. Typical sub-Gaussian random variables include Gaussian variables, Bernoulli variables and any bounded random variables.

Definition 3 (sub-exponential).

A random variable is said to be sub-exponential with parameter (denoted as ) if and

 Eexp(sx)≤exp(K2s22), ∀|s|≤1K.

Equivalent definitions of sub-exponential random variables can be found in [44, Proposition 2.7.1]. All sub-Gaussian random variables and their squares are sub-exponential. In addition, exponential and Poisson random variables belong to sub-exponential variables.

Fact 1 (Exercise 4.4.3 and Corollary 4.2.13, [44]).

Let and . Then for any -net of the unit sphere and -net of the unit sphere , we have

 ∥A∥≤11−2εsupx∈N,y∈M⟨Ax,y⟩,

where denotes the inner product between two vectors, i.e. If and is symmetric, then

 ∥A∥≤11−2εsupx∈N|⟨Ax,x⟩|.

Furthermore, there exist -nets and with cardinalities

 |N|≤(1+2ε)n  and  |M|≤(1+2ε)m.
Fact 2 (Lemma 1.4, [45]).

Let , then for any ,

 E|x|k≤(2σ2)k2kΓ(k2),

where .

Fact 3 (Lemma 1.12, [45]).

Let , then the random variable is sub-exponential with .

Fact 4 (Bernstein’s inequality, Theorem 2.8.2, [44]).

Let be independent random variables with and , and . Define . Then for any , we have

 P(|Sm|≥t)≤2exp(−12min{m2t2K2∥a∥22,mtK∥a∥∞}),

where and .

Here facts 3 and 4 are derived by slightly modifying the original results in [45] and [44] respectively. For the convenience of the reader, we include the detailed proofs in Appendix B.

We are now in position to prove Theorem 1. For clarity, the proof is divided into several steps.

• Problem reduction. Let be the spectral decomposition of the symmetric matrix , where is a diagonal matrix whose entries are the eigenvalues of , and is an orthonormal matrix. Then we have

 P(∥W−EW∥≥t) =P(1m∥∥XUDBUTXT−E[XUDBUTXT]∥∥≥t) =P(1m∥∥XDBXT−E[XDBXT]∥∥≥t) =P(∥∥ ∥∥1mm∑i=1λi(xixTi−In)∥∥ ∥∥≥t), (8)

where the second equality holds because the Gaussian matrix is orthogonally invariant.

• Approximation. Choose . By Fact 1, we get

 ∥∥ ∥∥1mm∑i=1λi(xixTi−In)∥∥ ∥∥ ≤2supu∈N∣∣ ∣∣1mm∑i=1λi⟨(xixTi−In)u,u⟩∣∣ ∣∣ =2supu∈N∣∣ ∣∣1mm∑i=1λi(⟨xi,u⟩2−1)∣∣ ∣∣,

where is a -net of with . Thus we have

 P(∥∥ ∥∥1mm∑i=1λi(xixTi−In)∥∥ ∥∥≥t)≤P(supu∈N∣∣ ∣∣1mm∑i=1λi(⟨xi,u⟩2−1)∣∣ ∣∣≥t2).
• Concentration. Fix , we are going to bound

 P(∣∣ ∣∣1mm∑i=1λi(⟨xi,u⟩2−1)∣∣ ∣∣≥t2).

By assumption, are independent Gaussian random variables with mean zero and variance . Thus we have . From Fact 3, we know are independent sub-exponential variables with mean zero and . By using Bernstein’s inequality (Fact 4), we have

Since and , we obtain

• Tail bound. Taking union bound for all yields

Assigning

we obtain

 P(supu∈N∣∣ ∣∣1mm∑i=1λi(⟨xi,u⟩2−1)∣∣ ∣∣≥t12)≤9n⋅2exp(−2δ2)=2exp(−2δ2+2nlog3).

Therefore, we show that, for any ,

 P(∥W−EW∥≥32∥B∥Fδ+64∥B∥δ2m)≤2exp(−2δ2+2nlog3).

In particular, if , we have

 P(∥W−EW∥≥32∥B∥Fδ+64∥B∥δ2m)≤2exp(−δ2),

which is useful to establish the expectation bound.

• Expectation bound. For any , we have

 E∥W−EW∥ =∫∞0P(∥W−EW∥≥t)dt =32m∫∞0P(∥W−EW∥≥t)(∥B∥F+4δ∥B∥)dδ

where the first equality follows from the integral identity, in the second inequality we have let . We continue by scaling and variable replacement as follows

Appendix B Proof of Facts

B-a Proof of Fact 3

In this proof, we slightly improve the result of [45, Lemma 1.12]. By the Taylor expansion, we have

 E[exp(s(x2−E[x2]))]=1+∞∑k=2skE[x2−E[x2]]kk!.

Due to the convexity of for and , it follows from Jensen’s inequality that

 (x2−E[x2]2)k≤(x2+E[x2]2)k≤x2k+(E[x2])k2.

By using the above inequality and Jensen’s inequality again, we obtain

 Eexp(s(x2−E[x2])) ≤1+∞∑k=2sk2k−1(E[x2k]+(E[x2])k)k! ≤1+∞∑k=2sk2kE[x2k]k!.

By using Fact 2, if , we have

 Eexp(s(x2−E[x2])) ≤1+∞∑k=2sk2k(2σ2)kk!k! ≤1+∞∑k=2(4sσ2)k ≤1+32s2σ4 ≤exp((8σ2)2s22).

According to the definition of sub-exponential random variable, we have .

B-B Proof of Fact 4

The proof is developed from [44, Theorem 2.8.2] and [45, Theorem 1.13] with explicit constant. Without loss of generality, we assume that , otherwise we can replace by and by to verify the general result. By using the Chernoff bound, for all , we have

 P(Sm≥t) ≤exp(−smt)Eexp(sm∑i=1aixi) =exp(−smt)m∏i=1Eexp(saixi).

According to the definition of sub-exponential, if , we have

 Eexp(saixi)≤exp(s2a2i2).

In order to make the above inequality hold for all , we have . So we have

 P(Sm≥t) =exp(−smt)m∏i=1exp(s2a2i2) =exp(∥a∥222s2−smt).

Choosing

yields

 P(Sm≥t)≤exp(−12min{m2t2K2∥a∥22,mtK∥a∥∞}).

We can obtain the same bound for by replacing by , which completes the proof.

Acknowledgment

The authors thank the anonymous referees and the Associate Editor for useful comments which have helped to improve the presentation of this paper.

References

• [1] H. Krim and M. Viberg, “Two decades of array signal processing research: the parametric approach,” IEEE Signal Process. Mag., vol. 13, no. 4, pp. 67–94, 1996.
• [2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed.   New York, NY, USA: Springer, 2009, ch. 4 and 14.
• [3] T. T. Cai, Z. Ren, and H. H. Zhou, “Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation,” Electron. J. Stat., vol. 10, no. 1, pp. 1–59, 2016.
• [4] J. Fan, Y. Liao, and H. Liu, “An overview of the estimation of large covariance and precision matrices,” Econom. J., vol. 19, no. 1, pp. C1–C32, 2016.
• [5] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proc. IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
• [6] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, 1986.
• [7] R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational invariance techniques,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 7, pp. 984–995, 1989.
• [8] Z. Bai and Y. Yin, “Limit of the smallest eigenvalue of a large dimensional sample covariance matrix,” Ann. Probab., pp. 1275–1294, 1993.
• [9]

G. Aubrun, “Sampling convex bodies: a random matrix approach,”

Proc. Amer. Math. Soc., vol. 135, no. 5, pp. 1293–1303, 2007.
• [10] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” in Compressed Sensing, Theory and Applications, Y. Eldar and G. Kutyniok, Eds.   Cambridge, U.K.: Cambridge Univ. Press., 2012, pp. 201–268.
• [11] R. Adamczak, A. Litvak, A. Pajor, and N. Tomczak-Jaegermann, “Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles,” J. Amer. Math. Soc., vol. 23, no. 2, pp. 535–561, 2010.
• [12] R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann, “Sharp bounds on the rate of convergence of the empirical covariance matrix,” C.R. Math., vol. 349, no. 3, pp. 195–200, 2011.
• [13] R. Vershynin, “How close is the sample covariance matrix to the actual covariance matrix?” J. Theor. Probab., vol. 25, no. 3, pp. 655–686, 2012.
• [14] N. Srivastava and R. Vershynin, “Covariance estimation for distributions with 2+moments,” Ann. Probab., vol. 41, no. 5, pp. 3081–3111, 2013.
• [15] V. Koltchinskii and K. Lounici, “Concentration inequalities and moment bounds for sample covariance operators,” Bernoulli, vol. 23, no. 1, pp. 110–133, 2017.
• [16] D. Ramírez, J. Vía, I. Santamaría, and L. L. Scharf, “Detection of spatially correlated Gaussian time series,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5006–5015, 2010.
• [17] Y. Huang and X. Huang, “Detection of temporally correlated signals over multipath fading channels,” IEEE Trans. Wireless Commun., vol. 12, no. 3, pp. 1290–1299, 2013.
• [18] D.-S. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation and its effect on the capacity of multielement antenna systems,” IEEE Trans. Commun., vol. 48, no. 3, pp. 502–513, 2000.
• [19] Y. Liu, T. F. Wong, and W. W. Hager, “Training signal design for estimation of correlated MIMO channels with colored interference,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1486–1497, 2007.
• [20] T. W. Epps, “Comovements in stock prices in the very short run,” J. Amer. Stat. Assoc., vol. 74, no. 366a, pp. 291–298, 1979.
• [21] M. C. Münnix, R. Schäfer, and T. Guhr, “Impact of the tick-size on financial returns and correlations,” Physica A, vol. 389, no. 21, pp. 4828–4843, 2010.
• [22] P. J. Bickel and E. Levina, “Regularized estimation of large covariance matrices,” Ann. Stat., pp. 199–227, 2008.
• [23] T. T. Cai, Z. Ren, and H. H. Zhou, “Optimal rates of convergence for estimating toeplitz covariance matrices,” Probab. Theory Relat. Fields, vol. 156, no. 1-2, pp. 101–143, 2013.
• [24] P. J. Bickel and E. Levina, “Covariance regularization by thresholding,” Ann. Stat., pp. 2577–2604, 2008.
• [25] A. J. Rothman, E. Levina, and J. Zhu, “Generalized thresholding of large covariance matrices,” J. Amer. Stat. Assoc., vol. 104, no. 485, pp. 177–186, 2009.
• [26] B. D. Carlson, “Covariance matrix estimation errors and diagonal loading in adaptive arrays,” IEEE Trans. Aerosp. Electron. Syst., vol. 24, no. 4, pp. 397–401, 1988.
• [27] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” J. of Multivariate Anal., vol. 88, no. 2, pp. 365–411, 2004.
• [28] Y. Chen, A. Wiesel, Y. C. Eldar, and A. O. Hero, “Shrinkage algorithms for mmse covariance estimation,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5016–5029, 2010.
• [29] A. Coluccia, “Regularized covariance matrix estimation via empirical bayes,” IEEE Signal Process. Lett., vol. 22, no. 11, pp. 2127–2131, 2015.
• [30] C. Stein, “Lectures on the theory of estimation of many parameters,” J. Sov. Math., vol. 34, no. 1, pp. 1373–1403, 1986.
• [31] N. El Karoui, “Spectrum estimation for large dimensional covariance matrices using random matrix theory,” Ann. Stat., vol. 36, no. 6, pp. 2757–2790, 2008.
• [32]

O. Ledoit and S. Péché, “Eigenvectors of some large sample covariance matrix ensembles,”

Probab. Theory Relat. Fields, vol. 151, no. 1-2, pp. 233–264, 2011.
• [33] B. Collins, D. McDonald, and N. Saad, “Compound Wishart matrices and noisy covariance matrices: Risk underestimation,” 2013. [Online]. Available: https://arxiv.org/abs/1306.5510.
• [34] Z. Burda, A. Jarosz, M. A. Nowak, J. Jurkiewicz, G. Papp, and I. Zahed, “Applying free random variables to random matrix analysis of financial data. Part I: The Gaussian case,” Quant. Financ., vol. 11, no. 7, pp. 1103–1124, 2011.
• [35] C.-N. Chuah, D. N. C. Tse, J. M. Kahn, and R. A. Valenzuela, “Capacity scaling in MIMO wireless systems under correlated fading,” IEEE Trans. Inf. Theory, vol. 48, no. 3, pp. 637–650, 2002.
• [36] R. Speicher,

Combinatorial Theory of the Free Product with Amalgamation and Operator-valued Free Probability Theory

.   Rhode Island, USA: American Mathematical Society, 1998.
• [37] I. Soloveychik, “Error bound for compound wishart matrices,” 2014. [Online]. Available: https://arxiv.org/abs/1402.5581.
• [38] E. Levina and R. Vershynin, “Partial estimation of covariance matrices,” Probab. Theory Relat. Fields, vol. 153, no. 3-4, pp. 405–419, 2012.
• [39] D. Paulin, L. Mackey, and J. A. Tropp, “Efron-Stein inequalities for random matrices,” Ann. Probab., vol. 44, no. 5, pp. 3431–3473, 2016.
• [40] C. Stein, “A bound for the error in the normal approximation to the distribution of a sum of dependent random variables,” in Proc. Berkeley Symposium Mathematical Statistics and Probability.   The Regents of the University of California, 1972, pp. 583–602.
• [41] ——, “Approximate computation of expectations,” Lecture Notes-Monograph Series, vol. 7, pp. 1–164, 1986.
• [42] W. Bryc, “Compound real Wishart and q-Wishart matrices,” Int. Math. Res. Notices, vol. 2008, 2008.
• [43] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed.   Maryland, USA: Johns Hopkins Univ. Press, 2013.
• [44] R. Vershynin,

High-Dimensional Probability An Introduction with Applications in Data Science

.   Cambridge, U.K.: Cambridge Univ. Press, 2018.
• [45]

P. Rigollet, “High-dimensional statistics,”

Lecture notes for course 18.S997, 2018.

Appendix B Proof of Facts

B-a Proof of Fact 3

In this proof, we slightly improve the result of [45, Lemma 1.12]. By the Taylor expansion, we have

 E[exp(s(x2−E[x2]))]=1+∞∑k=2skE[x2−E[x2]]kk!.

Due to the convexity of for and , it follows from Jensen’s inequality that

 (x2−E[x2]2)k≤(x2+E[x2]2)k≤x2k+(E[x2])k2.

By using the above inequality and Jensen’s inequality again, we obtain

 Eexp(s(x2−E[x2])) ≤1+∞∑k=2sk2k−1(E[x2k]+(E[x2])k)k! ≤1+∞∑k=2sk2kE[x2k]k!.

By using Fact 2, if , we have

 Eexp(s(x2−E[x2])) ≤1+∞∑k=2sk2k(2σ2)kk!k! ≤1+∞∑k=2(4sσ2)k ≤1+32s2σ4 ≤exp((8σ2)2s22).

According to the definition of sub-exponential random variable, we have .

B-B Proof of Fact 4

The proof is developed from [44, Theorem 2.8.2] and [45, Theorem 1.13] with explicit constant. Without loss of generality, we assume that , otherwise we can replace by and by to verify the general result. By using the Chernoff bound, for all , we have

 P(Sm≥t) ≤exp(−smt)Eexp(sm∑i=1aixi) =exp(−smt)m∏i=1Eexp(saixi).

According to the definition of sub-exponential, if , we have

 Eexp(saixi)≤exp(s2a2i2).

In order to make the above inequality hold for all , we have . So we have

 P(Sm≥t) =exp(−smt)m∏i=1exp(s2a2i2) =exp(∥a∥222s2−smt).

Choosing

yields

 P(Sm≥t</