I Introduction
In statistical signal processing, one critical problem is to estimate the covariance matrix, which has extensive applications in correlation analysis, portfolio optimization, and various signal processing tasks in radar and communication systems [1][5]. One key challenge is that when the dimensionality is large but the sample support is relatively low, the estimated covariance matrix , which may be obtained using a general method such as sample covariance matrix (SCM) or ordinary least squares (OLS), becomes illconditioned or even singular, and suffers from significant errors relative to the true covariance matrix . Consequently, signal processing tasks that rely on covariance matrix estimation may perform poorly or fail to apply. Regularization techniques have attracted tremendous attention recently for covariance matrix estimation. By imposing structural assumptions of the true covariance matrix , techniques such as banding [6], thresholding [7], and shrinkage [8][18] have demonstrated great potential for improving the performance of covariance matrix estimation. See [19][21] for recent surveys.
This paper is concerned with the linear shrinkage estimation of covariance matrices. Given an estimate of the covariance matrix, a linear shrinkage estimate is constructed as
(1) 
where is the shrinkage target and and are nonnegative shrinkage coefficients. In general, the shrinkage target
is betterconditioned, more parsimonious or more structured, with lower variance but higher bias compared to the original estimate
[11]. The coefficients and are chosen to provide a good tradeoff between bias and variance, such that an estimate outperforming both and is achieved and a better approximation to the true covariance matrix can be obtained. Compared to other regularized estimators such as banding and thresholding, linear shrinkage estimators can be easily designed to guarantee positivedefiniteness. Such shrinkage designs have been employed in various applications which utilize covariance matrices and have demonstrated significant performance improvements. The linear shrinkage approach has also been generalized to nonlinear shrinkage estimation of covariance matrices [22, 23], and is closely related to several unitarily invariant covariance matrix estimators that shrink the eigenvalues of the SCM, such as those imposing condition number constraints on the estimate
[24, 25]. There are also a body of studies on shrinkage estimation of precision matrix (the inverse of covariance matrix) [26][30] and on applicationoriented design of shrinkage estimators. See [31][36] for example applications in array signal processing.Shrinkage has a Bayes interpretation [2, 9]. The true covariance matrix can be assumed to be within the neighborhoods of the shrinkage target . There can be various different approaches for constructing and . For example, when a generative model about the observation exists, one may first estimate the model parameters and then construct [20]. A typical example of this is linear models seen in communication systems. Furthermore, different types of shrinkage targets, not necessarily limited to identity or diagonal targets, can be used to better utilize prior knowledge. For example, knowledgeaided spacetime signal processing (KASTAP) may set using knowledge about the environment [3] or past covariance matrix estimates [37]. Even multiple shrinkage targets can be applied when distinct guesses about the true covariance matrix are available [17].
The choice of shrinkage coefficients significantly influences the performance of linear shrinkage estimators. Various criteria and methods have been studied. Under the mean squared error (MSE) criterion, Ledoit and Wolf (LW) [2] derived closedform solutions based on asymptotic estimates of the statistics needed for finding the optimal shrinkage coefficients, where and
are assumed as the SCM and identity matrix, respectively. Later the LW solution was extended for more general shrinkage targets
[3, 17]. Chen et al [4]assumed Gaussian distribution and proposed an oracle approximating shrinkage (OAS) estimator, which achieves nearoptimal parameter choice for Gaussian data even with very low sample supports. The shrinkage coefficients determination can also be cast as a model selection problem and thus generic model selection techniques such as crossvalidation (CV)
[38][40] can be applied. In general, CV splits the training samples for multiple times into disjoint subsets and then fits and assesses the models under different splits based on a properly chosen prediction loss. This has been explored, e.g., in [10, 13], where the Gaussian likelihood is used as the prediction loss.All these datadriven techniques achieve nearoptimal parameter choice when the underlying assumptions hold. However, there are also limitations to their applications: almost all existing analytical solutions to shrinkage coefficients [2][4], [17]
were derived under the assumption of SCM and certain special forms of shrinkage targets. They need to be redesigned when applied to other cases, which is generally nontrivial. The asymptotic analysisbased methods
[2, 3] may not perform well when the sample support is very low. Although the existing CV approaches [10, 13] have broader applications, they assume Gaussian distribution and employ grid search to determine the shrinkage coefficients. The likelihood cost of [10, 13] must be computed for multiple data splits and multiple candidates of shrinkage coefficients, which can be timeconsuming.In this paper, we further investigate datadriven techniques that automatically tune the linear shrinkage coefficients using leaveoneout crossvalidation (LOOCV). We choose a simple quadratic loss as the prediction loss for LOOCV, and derive analytical and computationally efficient solutions. The solutions do not need to specify the distribution of the data. Furthermore, the LOOCV treatment is applicable to different covariance matrix estimators including the SCM and ordinary least squares (OLS)based schemes. It can be used together with general shrinkage targets and can also be easily extended to incorporate multiple shrinkage targets. The numerical examples show that the proposed method can achieve oracleapproximating performance for covariance matrix estimation and can improve the performance of several array signal processing schemes.
The remainder of the paper is organized as follows. In Section 2, we present computationally efficient LOOCV methods for choosing the linear shrinkage coefficients for both SCM and OLSbased covariance matrix estimators and also compare the proposed LOOCV methods with several existing methods which have attracted considerable attentions recently. In Section 3, we extend our results for multitarget shrinkage. Section 4 reports numerical examples, and finally Section 5 gives conclusions.
Ii LOOCV Choice of Linear Shrinkage Coefficients
This paper deals with the estimation of covariance matrices of zeromean signals whose fourthorder moments exist. We study the LOOCV choice of the shrinkage coefficients for the linear shrinkage covariance matrix estimator (
1), i.e., . The following assumptions are made:
The true covariance matrix , the estimated covariance matrix , and the shrinkage target are all Hermitian and positivesemidefinite (PSD).

independent, identically distributed (i.i.d.) samples of the signal are available.

The shrinkage coefficients are nonnegative, i.e.,
(2)
Assumption 3 follows the treatments in [2][4] and is sufficient but not necessary to guarantee that the shrinkage estimate is PSD when Assumption 1 holds^{1}^{1}1Imposing Assumption 3 may introduce performance loss. Alternatively, one may remove the constraint and impose a constraint that is PSD, similar to a treatment in [5].. Two classes of shrinkage targets will be considered in this paper. One is constructed independent of the training samples for generating , similarly to the knowledgeaided targets as considered in [3]. The other is constructed from , but is highly structured with significantly fewer free parameters as compared to . Examples of the second class include those constructed using only the diagonal entries of [4, 20] and the Toeplitz approximations of [17].
Iia Oracle Choice
Different criteria may be used for evaluating the covariance matrix estimators. In this paper, we use the squared Frobenius norm of the estimation error as the performance measure. Given , and , the oracle shrinkage coefficients minimize
(3) 
where denotes the Frobenius norm. The cost function in (3) can then be rewritten as a quadratic function of the shrinkage coefficients:
(4) 
(5) 
(6) 
where denotes the trace of a matrix. As is positivedefinite, we can find the minimizer of by solving the above bivariate convex optimization problem. We can also apply the KarushKuhnTucker (KKT) conditions to find the solution analytically. From (4), letting leads to
(7) 
(8) 
The oracle shrinkage coefficients can be obtained by solving (7) and (8):
(9) 
Note that (9) may produce negative shrinkage coefficients, which may not lead to a positivedefinite estimate of the covariance matrix. In this case, we clip the negative coefficient to zero and then find the other coefficient using (7) or (8) to guarantee the positive definiteness, for or , respectively. This treatment is similar to [2][5] and provides a suboptimal yet simple solution. The oracle estimator requires knowledge of , which is unavailable in real applications, but the result serves as an upper bound of the performance given the linear shrinkage structure.
IiB LOOCV Choice for General Cases
Let denote a positivedefinite, Hermitian matrix. It can be easily verified that the following cost
(10) 
is minimized when , where the expectation is taken over . In this paper, we apply LOOCV [38] to produce an estimate of as the proxy for measuring the accuracy of , based on which the shrinkage coefficients can be selected. With the LOOCV method, the length training data is repeatedly split into two sets with respect to time. For the th split, where , samples in (with the th column omitted from ) are used for producing a covariance matrix estimate and the remaining sample is spared for parameter validation. In total, splits of the training data are used and all the training samples are used for validation once. Assuming shrinkage estimation with given shrinkage coefficients , we construct from each a shrinkage covariance matrix estimator as
(11) 
We propose to use the following LOOCV cost function
(12)  
(13) 
to approximate the cost in (10) when is chosen as . For notational simplicity, define
(14) 
After some manipulations, the above cost function can be written similarly to (4) as
(15) 
where
(16) 
(17) 
The shrinkage coefficients can then be found by solving the above bivariate, constantcoefficient quadratic program. Analytical solutions can be obtained under different conditions, as shown below.
IiB1 Unconstrained shrinkage
For unconstrained , setting the partial derivatives yields
(18) 
(19) 
Solving (18) and (19) produces the unconstrained solution
(20) 
We choose (20) as the optimal shrinkage coefficients if both and are nonnegative. Otherwise, we consider the optimal choices on the boundary of specified by (18) or (19) for or as
(21) 
or
(22) 
IiB2 Constrained shrinkage
For the more parsimonious design using convex linear combination, the following constraint is imposed:
(23) 
By plugging (23) into the cost function (12) and taking the minimizer, we can also easily find the optimal shrinkage coefficients using
(24) 
In case a negative shrinkage coefficient is produced, we set it to zero and let the other be one according to (23). Note that although the closedform solution involves multiple matrix operations, the quantities involved need to be computed only once. Furthermore, the computational complexity may be greatly reduced given a specific method of covariance matrix estimation. In the following two subsections, we will show the simplified solutions for SCM and OLSbased covariance matrix estimation.
IiC LOOCV Choice for SCMBased Estimation
We consider in this subsection that is the SCM estimate of . In this case,
(25) 
which is a sufficient statistic for Gaussiandistributed data when the mean vector is the zero vector. For the
th split, the SCM constructed from all the samples except the th is(26) 
We can then verify the following expressions for quickly computing the relevant quantities in (16) and (17):
(27) 
(28) 
(29) 
(30) 
Plugging these into (16) and (17) and after some manipulations, we can rewrite the LOOCV cost function (15) as
(31) 
The optimal shrinkage coefficients can then be obtained analytically from the SCM , the shrinkage target , and the training samples , as discussed below.
IiC1 Unconstrained shrinkage
It can be verified from (19) and (30) that the optimal shrinkage coefficients (ignoring the nonnegative constraint ) satisfy
(32) 
The closedform solution to is given by
(33) 
In case or , we apply (21) or (22), respectively, to determine the solution, using the expressions in (27)(30).
Note that for the typical shrinkage target , (32) results in . This provides another justification for the convex linear combination design with an identity target, which has been widely adopted in the literature, e.g., [4]. This also shows that for such a special target the unconstrained solution is equivalent to the constrained solution, which does not hold for more general shrinkage targets.
IiC2 Constrained shrinkage
For the widely considered convex linear combination with constraint the optimal (ignoring the nonnegative constraint) is computed as
(34) 
Similarly, in case a negative shrinkage coefficient is obtained, we set it to zero and let the other be one.
The above results show that the optimal shrinkage coefficients for the covariance matrix estimate (1) can be computed directly from the samples and shrinkage target, without the need of specifying any user parameters. The constrained shrinkage design may lead to certain performance loss as compared to the unconstrained one.
IiD LOOCV Choice for OLSBased Covariance Estimation
One advantage of the LOOCV method is that it can be applied to different covariance matrix estimators. In this subsection, we discuss the LOOCV method for OLSbased covariance matrix estimation. Note that most existing analytical solutions for choosing the shrinkage coefficients assume SCM and specific shrinkage targets and need to be rederived for general cases. Also, in contrast to general applications of LOOCV which require a grid search of the parameters and thus a high computational complexity, we have shown that for SCM, fast analytical solutions can be obtained for choosing the shrinkage coefficients. This will also be the case for the OLSbased covariance matrix estimation.
Consider the case with observation modeled as
(35) 
where is a deterministic channel matrix and
a zeromean, white noise with covariance matrix
, which is uncorrelated with the zeromean input signal with covariance matrix . If both training samples of and are known, we may first estimate the channel matrix and the covariance matrix of the noise using the ordinary least squares (OLS) approach. Let the block of training data be , where the input signal can be designed to have certain properties such as being orthogonal. The OLS estimates of the channel matrix and noise variance are then obtained as(36)  
(37) 
where denotes the estimate of a quantity. In this case, the covariance matrix of can be estimated as
(38) 
Such OLSbased covariance matrix estimation may be useful for designing signal estimation schemes in wireless communications. We can apply the linear shrinkage design (1) to enhance its accuracy and apply the LOOCV method (12) to choose the shrinkage coefficients. Note that in this case, in the th split, we generate the covariance matrix estimate by applying the OLS estimate to the leaveoneout samples which are the subset of with the pair omitted. The LOOCV cost is the same as (15). In this case, the leaveoneout estimate of the covariance matrix for the th data split is
(39) 
where and denote the channel matrix and noise variance estimated from , respectively. A direct computation of (16) and (17) for evaluating the LOOCV cost performs OLS estimation for times, which incurs significant complexity. The complexity can be greatly reduced by observing that the leaveoneout OLS estimate of the channel matrix is related to the OLS channel matrix estimate in (36) by a rankone update:
(40) 
where
(41) 
(42) 
In the above,
(43) 
is the th diagonal entry of
(44) 
Similarly, the leaveoneout estimate of the noise variance can be updated as
(45) 
where
(46) 
Note that both updates can be achieved with low complexity when a few matrices are computed in advance and reused. In this way, the covariance matrix estimate can be computed as
(47) 
where and are defined as
(48) 
(49) 
This shows that the leaveoneout OLS covariance matrix estimate can be obtained from by corrections involving a scaled identity matrix and two rankone updates. Eqn. (47) can be exploited to compute the closedform LOOCV solution quickly. From (47), the most involved computation for finding the solution of the optimization problem (15) can be implemented as
(50) 
where denotes the real part of a scalar. When is already computed, the righthand side of (IID) can be evaluated using inner products and matrixvector products. The terms and are the same as those for SCM. For the other two terms, we have
(51) 
(52) 
Note that the computational complexities of (IID)(IID) are low because the major operations are matrixvector products and inner products.
IiE Comparisons with Alternative Choices of Linear Shrinkage Coefficients
In the above, we have introduced LOOCV methods with analytical solutions for choosing the coefficients for linear shrinkage covariance matrix estimators. We now discuss several alternative techniques which have received considerable attentions recently and compare them with the LOOCV methods proposed in this paper.
In 2004, Ledoit and Wolf (LW) [2] studied estimators that shrink SCM toward an identity target, i.e.,
. Such estimators do not alter the eigenvectors but shrink eigenvalues of the SCM, which is well supported by the fact that sample eigenvalues tend to be more spread than population eigenvalues. The optimal shrinkage coefficients under the MMSE criterion (
3) can be written as(53) 
where the parameters , , , and depend on the true covariance matrix and other unknown statistics. Ref. [2] shows that and proposes to approximate these quantities by their asymptotic estimates under , , as
(54) 
(55) 
which can all be computed from the training samples. By substituting these into (53), estimators that significantly outperform SCM are obtained, which also approach the oracle estimators when the training length is large enough.
The above LW estimator is extended by Stoica et al in 2008 [3] for complexvalued signals with general shrinkage targets , with applications to knowledgeaided spacetime adaptive processing (KASTAP) in radar applications. Several estimators with similar performance are derived there. For the general linear combination (GLC) design of [3], it is shown that the oracle shrinkage coefficients for (1) satisfy
(56) 
where
(57) 
The quantity is estimated in the same way as (55), and a computationally efficient expression for is given by
(58) 
Furthermore, and are estimated as and , respectively. This leads to the result given by Eqns. (34) and (35) of [3], which can recover the LW estimator [2] when the identity shrinkage target is assumed.
More recently, Chen et al [4] derived the oracle approximating shrinkage (OAS) estimator, which assumes SCM, realvalued Gaussian samples, and scaled identity target with and . They first derive the oracle shrinkage coefficients for SCM obtained from i.i.d. Gaussian samples, which is determined by and . Then, they propose an iterative procedure to approach the oracle estimator. In the iterations, and are estimated by and , respectively, where is the covariance matrix estimate at the th iteration. It is further proved that converges to the OAS estimator with the following analytical expression for :
(59) 
This approach achieves superior performance for (scaled) identity target and Gaussian data and dominates the LW estimator [2] when is small. It was later generalized by Senneret et al [20] to a shrinkage target chosen as the diagonal entries of the SCM. Other related techniques include [14], which also assumes SCM, Gaussian data, and identity/diagonal shrinkage targets.
All the above techniques provide analytical solutions and achieve nearoracle performance when the underlying assumptions (e.g., large dimensionality, large size of training data, identity/diagonal shrinkage targets) hold. However, they also have limitations. A common restriction is that all these analytical solutions assume SCM and are not optimized for other types of covariance matrix estimators such as modelbased estimators. In particular, the LW and GLC methods [2, 3], which employ asymptotic approximations, may exhibit a noticeable gap to the oracle choice when the sample support is low, which may be relevant in some applications. The OAS method [4] assumed identity target, but its extensions to more general cases, e.g., with multiple/general shrinkage targets, are not trivial. By contrast, the LOOCV method proposed in this paper allows different designs and achieves nearoracle performance in general.
Crossvalidation has also been applied previously for choosing shrinkage coefficients for covariance matrix estimation. The key issues for applying this generic tool include finding appropriate predictive metrics for scoring the different estimators and fast computation schemes. In [10, 13], the Gaussian likelihood was chosen as such a proxy. The computations with likelihood are generally involved as multiple matrix inverses/determinants are required, and a grid search is required for finding the optimal parameters. In this paper, we use the distributionfree, Frobenius norm loss in (12) as the metric, which leads to analytical solutions and is computationally more tractable.
Iii MultiTarget Shrinkage
In Section 2, we have considered linear shrinkage designs with a single target. Multiple shrinkage targets may be used to further enhance performance, which may be obtained from a priori knowledge, e.g., a past covariance matrix estimate from older training samples or from neighboring frequencies. We can easily extend our proposed LOOCV method to multiple targets.
Iiia Oracle choice of shrinkage coefficients
Consider the multitarget shrinkage design
(60) 
where all the shrinkage coefficients are nonnegative to guarantee PSD covariance matrix estimates, i.e.,
(61) 
The oracle multitarget shrinkage minimizes the squared Frobenius norm of the estimation error
(62) 
which can be rewritten as
(63) 
where ,
(64) 
(65) 
The oracle shrinkage coefficients can then be obtained by solving the problem of minimizing the cost function of (63), which is a strictly convex quadratic program (SCQP) with variables.
IiiB LOOCV choice of shrinkage coefficients
We now extend the LOOCV method in Section 2 to the multitarget shrinkage here. Following the same treatment as in Section IIB, in each split of the training data, and are constructed to generate and validate the covariance matrix estimate, respectively. The multiple shrinkage coefficients are chosen to minimize the LOOCV cost
(66) 
The above cost function can be rewritten in a form similar to (15) as
(67) 
with
(68) 
(69) 
The constant entries of and can be computed in the same way as for the singletarget case. When is small, which is typically the case, the solution that minimizes the LOOCV cost can be found quickly using standard optimization tools. Alternatively, we may find first the global optimizer that ignores the nonnegative constraint by
(70) 
and check if the nonnegative condition is satisfied. If a negative shrinkage coefficient is produced, we then consider the boundaries of , which are equivalent to removing a certain number of shrinkage targets from the shrinkage design. The solution can be found in exactly the same way as (70) but with fewer targets.
Similarly to the singletarget case, we may also consider a constrained case, where the shrinkage targets have the same trace as the estimated covariance matrix , and
(71) 
Then the LOOCV cost function can be rewritten as
Comments
There are no comments yet.