# Linear Shrinkage Estimation of Covariance Matrices Using Low-Complexity Cross-Validation

Shrinkage can effectively improve the condition number and accuracy of covariance matrix estimation, especially for low-sample-support applications with the number of training samples smaller than the dimensionality. This paper investigates parameter choice for linear shrinkage estimators. We propose data-driven, leave-one-out cross-validation (LOOCV) methods for automatically choosing the shrinkage coefficients, aiming to minimize the Frobenius norm of the estimation error. A quadratic loss is used as the prediction error for LOOCV. The resulting solutions can be found analytically or by solving optimization problems of small sizes and thus have low complexities. Our proposed methods are compared with various existing techniques. We show that the LOOCV method achieves near-oracle performance for shrinkage designs using sample covariance matrix (SCM) and several typical shrinkage targets. Furthermore, the LOOCV method provides low-complexity solutions for estimators that use general shrinkage targets, multiple targets, and/or ordinary least squares (OLS)-based covariance matrix estimation. We also show applications of our proposed techniques to several different problems in array signal processing.

## Authors

• 4 publications
• 24 publications
• 4 publications
• 3 publications
• 18 publications
• 4 publications
08/30/2018

### Optimal shrinkage covariance matrix estimation under random sampling from elliptical distributions

This paper considers the problem of estimating a high-dimensional (HD) c...
10/30/2019

### Find what you are looking for: A data-driven covariance matrix estimation

The global minimum-variance portfolio is a typical choice for investors ...
11/02/2016

### Cross-validation based Nonlinear Shrinkage

Many machine learning algorithms require precise estimates of covariance...
09/21/2018

### Shrinkage estimation of large covariance matrices using multiple shrinkage targets

Linear shrinkage estimators of a covariance matrix --- defined by a weig...
04/30/2019

### A Low-Complexity Antenna-Layout-Aware Spatial Covariance Matrix Estimation Method

This paper proposed a low-complexity antenna layout-aware (ALA) covarian...
09/09/2021

### A Proximal Distance Algorithm for Likelihood-Based Sparse Covariance Estimation

This paper addresses the task of estimating a covariance matrix under a ...
12/03/2018

### Analysis of Geometric Selection of the Data-Error Covariance Inflation for ES-MDA

The ensemble smoother with multiple data assimilation (ES-MDA) is becomi...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In statistical signal processing, one critical problem is to estimate the covariance matrix, which has extensive applications in correlation analysis, portfolio optimization, and various signal processing tasks in radar and communication systems [1]-[5]. One key challenge is that when the dimensionality is large but the sample support is relatively low, the estimated covariance matrix , which may be obtained using a general method such as sample covariance matrix (SCM) or ordinary least squares (OLS), becomes ill-conditioned or even singular, and suffers from significant errors relative to the true covariance matrix . Consequently, signal processing tasks that rely on covariance matrix estimation may perform poorly or fail to apply. Regularization techniques have attracted tremendous attention recently for covariance matrix estimation. By imposing structural assumptions of the true covariance matrix , techniques such as banding [6], thresholding [7], and shrinkage [8]-[18] have demonstrated great potential for improving the performance of covariance matrix estimation. See [19]-[21] for recent surveys.

This paper is concerned with the linear shrinkage estimation of covariance matrices. Given an estimate of the covariance matrix, a linear shrinkage estimate is constructed as

 ˆΣ_ρ,τ=ρR+τT_0, (1)

where is the shrinkage target and and are nonnegative shrinkage coefficients. In general, the shrinkage target

is better-conditioned, more parsimonious or more structured, with lower variance but higher bias compared to the original estimate

[11]. The coefficients and are chosen to provide a good tradeoff between bias and variance, such that an estimate outperforming both and is achieved and a better approximation to the true covariance matrix can be obtained. Compared to other regularized estimators such as banding and thresholding, linear shrinkage estimators can be easily designed to guarantee positive-definiteness. Such shrinkage designs have been employed in various applications which utilize covariance matrices and have demonstrated significant performance improvements. The linear shrinkage approach has also been generalized to nonlinear shrinkage estimation of covariance matrices [22, 23]

, and is closely related to several unitarily invariant covariance matrix estimators that shrink the eigenvalues of the SCM, such as those imposing condition number constraints on the estimate

[24, 25]. There are also a body of studies on shrinkage estimation of precision matrix (the inverse of covariance matrix) [26]-[30] and on application-oriented design of shrinkage estimators. See [31]-[36] for example applications in array signal processing.

Shrinkage has a Bayes interpretation [2, 9]. The true covariance matrix can be assumed to be within the neighborhoods of the shrinkage target . There can be various different approaches for constructing and . For example, when a generative model about the observation exists, one may first estimate the model parameters and then construct [20]. A typical example of this is linear models seen in communication systems. Furthermore, different types of shrinkage targets, not necessarily limited to identity or diagonal targets, can be used to better utilize prior knowledge. For example, knowledge-aided space-time signal processing (KA-STAP) may set using knowledge about the environment [3] or past covariance matrix estimates [37]. Even multiple shrinkage targets can be applied when distinct guesses about the true covariance matrix are available [17].

The choice of shrinkage coefficients significantly influences the performance of linear shrinkage estimators. Various criteria and methods have been studied. Under the mean squared error (MSE) criterion, Ledoit and Wolf (LW) [2] derived closed-form solutions based on asymptotic estimates of the statistics needed for finding the optimal shrinkage coefficients, where and

are assumed as the SCM and identity matrix, respectively. Later the LW solution was extended for more general shrinkage targets

[3, 17]. Chen et al [4]

assumed Gaussian distribution and proposed an oracle approximating shrinkage (OAS) estimator, which achieves near-optimal parameter choice for Gaussian data even with very low sample supports. The shrinkage coefficients determination can also be cast as a model selection problem and thus generic model selection techniques such as cross-validation (CV)

[38]-[40] can be applied. In general, CV splits the training samples for multiple times into disjoint subsets and then fits and assesses the models under different splits based on a properly chosen prediction loss. This has been explored, e.g., in [10, 13], where the Gaussian likelihood is used as the prediction loss.

All these data-driven techniques achieve near-optimal parameter choice when the underlying assumptions hold. However, there are also limitations to their applications: almost all existing analytical solutions to shrinkage coefficients [2]-[4], [17]

were derived under the assumption of SCM and certain special forms of shrinkage targets. They need to be re-designed when applied to other cases, which is generally nontrivial. The asymptotic analysis-based methods

[2, 3] may not perform well when the sample support is very low. Although the existing CV approaches [10, 13] have broader applications, they assume Gaussian distribution and employ grid search to determine the shrinkage coefficients. The likelihood cost of [10, 13] must be computed for multiple data splits and multiple candidates of shrinkage coefficients, which can be time-consuming.

In this paper, we further investigate data-driven techniques that automatically tune the linear shrinkage coefficients using leave-one-out cross-validation (LOOCV). We choose a simple quadratic loss as the prediction loss for LOOCV, and derive analytical and computationally efficient solutions. The solutions do not need to specify the distribution of the data. Furthermore, the LOOCV treatment is applicable to different covariance matrix estimators including the SCM- and ordinary least squares (OLS)-based schemes. It can be used together with general shrinkage targets and can also be easily extended to incorporate multiple shrinkage targets. The numerical examples show that the proposed method can achieve oracle-approximating performance for covariance matrix estimation and can improve the performance of several array signal processing schemes.

The remainder of the paper is organized as follows. In Section 2, we present computationally efficient LOOCV methods for choosing the linear shrinkage coefficients for both SCM- and OLS-based covariance matrix estimators and also compare the proposed LOOCV methods with several existing methods which have attracted considerable attentions recently. In Section 3, we extend our results for multi-target shrinkage. Section 4 reports numerical examples, and finally Section 5 gives conclusions.

## Ii LOOCV Choice of Linear Shrinkage Coefficients

This paper deals with the estimation of covariance matrices of zero-mean signals whose fourth-order moments exist. We study the LOOCV choice of the shrinkage coefficients for the linear shrinkage covariance matrix estimator (

1), i.e., . The following assumptions are made:

1. The true covariance matrix , the estimated covariance matrix , and the shrinkage target are all Hermitian and positive-semidefinite (PSD).

2. independent, identically distributed (i.i.d.) samples of the signal are available.

3. The shrinkage coefficients are nonnegative, i.e.,

 ρ≥0,τ≥0. (2)

Assumption 3 follows the treatments in [2]-[4] and is sufficient but not necessary to guarantee that the shrinkage estimate is PSD when Assumption 1 holds111Imposing Assumption 3 may introduce performance loss. Alternatively, one may remove the constraint and impose a constraint that is PSD, similar to a treatment in [5].. Two classes of shrinkage targets will be considered in this paper. One is constructed independent of the training samples for generating , similarly to the knowledge-aided targets as considered in [3]. The other is constructed from , but is highly structured with significantly fewer free parameters as compared to . Examples of the second class include those constructed using only the diagonal entries of [4, 20] and the Toeplitz approximations of [17].

### Ii-a Oracle Choice

Different criteria may be used for evaluating the covariance matrix estimators. In this paper, we use the squared Frobenius norm of the estimation error as the performance measure. Given , and , the oracle shrinkage coefficients minimize

 J_O(ρ,τ)=||ˆΣ_ρ,τ−Σ||_F2=||ρR+τT_0−Σ||_F2, (3)

where denotes the Frobenius norm. The cost function in (3) can then be rewritten as a quadratic function of the shrinkage coefficients:

 (4)
 A_O=[tr(R2)tr(RT_0)tr(RT_0)tr(T_02)], (5)
 b_O=[tr(RΣ)tr(T_0Σ)], (6)

where denotes the trace of a matrix. As is positive-definite, we can find the minimizer of by solving the above bivariate convex optimization problem. We can also apply the Karush-Kuhn-Tucker (KKT) conditions to find the solution analytically. From (4), letting leads to

 tr(R2)tr(RΣ)ρ+tr(RT_0)tr(RΣ)τ=1, (7)
 tr(RT_0)tr(T_0Σ)ρ+tr(T_02)tr(T_0Σ)τ=1. (8)

The oracle shrinkage coefficients can be obtained by solving (7) and (8):

 [ρ⋆_Oτ⋆_O]=A_O−1b_O. (9)

Note that (9) may produce negative shrinkage coefficients, which may not lead to a positive-definite estimate of the covariance matrix. In this case, we clip the negative coefficient to zero and then find the other coefficient using (7) or (8) to guarantee the positive definiteness, for or , respectively. This treatment is similar to [2]-[5] and provides a suboptimal yet simple solution. The oracle estimator requires knowledge of , which is unavailable in real applications, but the result serves as an upper bound of the performance given the linear shrinkage structure.

### Ii-B LOOCV Choice for General Cases

Let denote a positive-definite, Hermitian matrix. It can be easily verified that the following cost

 J_S(ˆΣ)≜E[||ˆΣ−yy†||_F2] (10)

is minimized when , where the expectation is taken over . In this paper, we apply LOOCV [38] to produce an estimate of as the proxy for measuring the accuracy of , based on which the shrinkage coefficients can be selected. With the LOOCV method, the length- training data is repeatedly split into two sets with respect to time. For the -th split, where , samples in (with the -th column omitted from ) are used for producing a covariance matrix estimate and the remaining sample is spared for parameter validation. In total, splits of the training data are used and all the training samples are used for validation once. Assuming shrinkage estimation with given shrinkage coefficients , we construct from each a shrinkage covariance matrix estimator as

 ˆΣ_t,ρ,τ=ρR_t+τT_0. (11)

We propose to use the following LOOCV cost function

 J_CV(ρ,τ) =1T∑_t=1T||ˆΣ_t,ρ,τ−y_ty_t†||_F2 (12) =1T∑_t=1T||ρR_t+τT_0−y_ty_t†||_F2 (13)

to approximate the cost in (10) when is chosen as . For notational simplicity, define

 S_t≜y_ty_t†. (14)

After some manipulations, the above cost function can be written similarly to (4) as

 (15)

where

 A_CV=⎡⎢⎣1T∑_t=1Ttr(R_t2)1T∑_t=1Ttr(R_tT_0)1T∑_t=1Ttr(R_tT_0)tr(T_02)⎤⎥⎦, (16)
 b_CV=⎡⎢⎣1T∑_t=1Ttr(R_tS_t)1T∑_t=1Ttr(T_0S_t)⎤⎥⎦. (17)

The shrinkage coefficients can then be found by solving the above bivariate, constant-coefficient quadratic program. Analytical solutions can be obtained under different conditions, as shown below.

#### Ii-B1 Unconstrained shrinkage

For unconstrained , setting the partial derivatives yields

 ∑_t=1Ttr(R_t2)∑_t=1Ttr(R_tS_t)ρ+∑_t=1Ttr(R_tT_0)∑_t=1Ttr(R_tS_t)τ=1, (18)
 ∑_t=1Ttr(R_tT_0)∑_t=1Ttr(T_0S_t)ρ+Ttr(T_02)∑_t=1Ttr(T_0S_t)τ=1. (19)

Solving (18) and (19) produces the unconstrained solution

 [ρ⋆_CVτ⋆_CV]=A_CV−1b_CV. (20)

We choose (20) as the optimal shrinkage coefficients if both and are nonnegative. Otherwise, we consider the optimal choices on the boundary of specified by (18) or (19) for or as

 ρ_CV⋆=∑_t=1Ttr(R_tS_t)∑_t=1Ttr(R_t2),τ⋆_CV=0, (21)

or

 ρ_CV⋆=0,τ_CV⋆=∑_t=1Ttr(T_0S_t)Ttr(T_02). (22)

#### Ii-B2 Constrained shrinkage

For the more parsimonious design using convex linear combination, the following constraint is imposed:

 ρ=1−τ. (23)

By plugging (23) into the cost function (12) and taking the minimizer, we can also easily find the optimal shrinkage coefficients using

 ρ_CV⋆=∑_t=1T(tr(T_02)−tr(R_tT_0)−tr(T_0S_t)+tr(S_tR_t))∑_t=1T(tr(R_t2)−2tr(R_tT_0)+tr(T_02)). (24)

In case a negative shrinkage coefficient is produced, we set it to zero and let the other be one according to (23). Note that although the closed-form solution involves multiple matrix operations, the quantities involved need to be computed only once. Furthermore, the computational complexity may be greatly reduced given a specific method of covariance matrix estimation. In the following two subsections, we will show the simplified solutions for SCM- and OLS-based covariance matrix estimation.

### Ii-C LOOCV Choice for SCM-Based Estimation

We consider in this subsection that is the SCM estimate of . In this case,

 R=1T∑_t=1Ty_ty_t†=1T∑_t=1TR_t=1T∑_t=1TS_t, (25)

which is a sufficient statistic for Gaussian-distributed data when the mean vector is the zero vector. For the

-th split, the SCM constructed from all the samples except the -th is

 R_t=1T−1∑_j≠ty_jy_j†=TT−1R−1T−1S_t. (26)

We can then verify the following expressions for quickly computing the relevant quantities in (16) and (17):

 1T∑_t=1Ttr(R_t2)=T(T−2)(T−1)2tr(R2)−1T(T−1)2∑_t=1T||y_t||_F4, (27)
 1T∑_t=1Ttr(R_tS_t)=TT−1tr(R2)−1T(T−1)∑_t=1T||y_t||_F4, (28)
 1T∑_t=1Ttr(R_tT_0)=tr(RT_0), (29)
 1T∑_t=1Ttr(S_tT_0)=tr(RT_0). (30)

Plugging these into (16) and (17) and after some manipulations, we can rewrite the LOOCV cost function (15) as

 J_CV(ρ,τ) =ρT(ρT−2ρ−2T+2)(T−1)2tr(R2) +2τ(ρ−1)tr(RT_0)+τ2tr(T_02) +1T(ρT−1+1)2∑_t=1T∥∥y_t∥∥_F4. (31)

The optimal shrinkage coefficients can then be obtained analytically from the SCM , the shrinkage target , and the training samples , as discussed below.

#### Ii-C1 Unconstrained shrinkage

It can be verified from (19) and (30) that the optimal shrinkage coefficients (ignoring the nonnegative constraint ) satisfy

 τ=(1−ρ)tr(RT_0)tr(T_02). (32)

The closed-form solution to is given by

 (33)

In case or , we apply (21) or (22), respectively, to determine the solution, using the expressions in (27)-(30).

Note that for the typical shrinkage target , (32) results in . This provides another justification for the convex linear combination design with an identity target, which has been widely adopted in the literature, e.g., [4]. This also shows that for such a special target the unconstrained solution is equivalent to the constrained solution, which does not hold for more general shrinkage targets.

#### Ii-C2 Constrained shrinkage

For the widely considered convex linear combination with constraint the optimal (ignoring the nonnegative constraint) is computed as

 ρ⋆_CV,SCM=Ttr(R2)T−1−2tr(RT_0)+tr(T_02)−∑_t=1T∥∥y_t∥∥_F4T(T−1)(T2−2T)tr(R2)(T−1)2−2tr(RT_0)+tr(T_02)+∑_t=1T∥∥y_t∥∥_F4T(T−1)2. (34)

Similarly, in case a negative shrinkage coefficient is obtained, we set it to zero and let the other be one.

The above results show that the optimal shrinkage coefficients for the covariance matrix estimate (1) can be computed directly from the samples and shrinkage target, without the need of specifying any user parameters. The constrained shrinkage design may lead to certain performance loss as compared to the unconstrained one.

### Ii-D LOOCV Choice for OLS-Based Covariance Estimation

One advantage of the LOOCV method is that it can be applied to different covariance matrix estimators. In this subsection, we discuss the LOOCV method for OLS-based covariance matrix estimation. Note that most existing analytical solutions for choosing the shrinkage coefficients assume SCM and specific shrinkage targets and need to be re-derived for general cases. Also, in contrast to general applications of LOOCV which require a grid search of the parameters and thus a high computational complexity, we have shown that for SCM, fast analytical solutions can be obtained for choosing the shrinkage coefficients. This will also be the case for the OLS-based covariance matrix estimation.

Consider the case with observation modeled as

 y=Hx+z, (35)

where is a deterministic channel matrix and

a zero-mean, white noise with covariance matrix

, which is uncorrelated with the zero-mean input signal with covariance matrix . If both training samples of and are known, we may first estimate the channel matrix and the covariance matrix of the noise using the ordinary least squares (OLS) approach. Let the block of training data be , where the input signal can be designed to have certain properties such as being orthogonal. The OLS estimates of the channel matrix and noise variance are then obtained as

 ˆH =YX†(XX†)−1, (36) ˆσ2 =1TN∥∥Y−ˆHX∥∥_F2 =1TNtr((Y−ˆHX)(Y−ˆHX)†) =1TNtr(Y(I−X†(XX†)−1X)Y†), (37)

where denotes the estimate of a quantity. In this case, the covariance matrix of can be estimated as

 R=ˆHˆH†+ˆσ2I. (38)

Such OLS-based covariance matrix estimation may be useful for designing signal estimation schemes in wireless communications. We can apply the linear shrinkage design (1) to enhance its accuracy and apply the LOOCV method (12) to choose the shrinkage coefficients. Note that in this case, in the -th split, we generate the covariance matrix estimate by applying the OLS estimate to the leave-one-out samples which are the subset of with the pair omitted. The LOOCV cost is the same as (15). In this case, the leave-one-out estimate of the covariance matrix for the -th data split is

 R_t=ˆH_tˆH_t†+ˆσ_t2I, (39)

where and denote the channel matrix and noise variance estimated from , respectively. A direct computation of (16) and (17) for evaluating the LOOCV cost performs OLS estimation for times, which incurs significant complexity. The complexity can be greatly reduced by observing that the leave-one-out OLS estimate of the channel matrix is related to the OLS channel matrix estimate in (36) by a rank-one update:

 ˆH_t=Y_tX_t†(X_tX_t†)−1=ˆH−e_tf_t†, (40)

where

 e_t≜y_t−ˆHx_t, (41)
 f_t≜11−Φ_t(XX†)−1x_t. (42)

In the above,

 Φ_t≜x_t†(XX†)−1x_t (43)

is the -th diagonal entry of

 Φ=X†(XX†)−1X. (44)

Similarly, the leave-one-out estimate of the noise variance can be updated as

 ˆσ_t2 =1N(T−1)tr((Y_t−ˆH_tX_t)(Y_t−ˆH_tX_t)†) =ˆσ2−δ_t, (45)

where

 δ_t=∥e_t∥_F2N(T−1)(1−Φ_t)−ˆσ2T−1. (46)

Note that both updates can be achieved with low complexity when a few matrices are computed in advance and reused. In this way, the covariance matrix estimate can be computed as

 R_t=R−δ_tI−e_tϕ_t†−ψ_te_t†, (47)

where and are defined as

 ϕ_t=ˆHf_t, (48)
 ψ_t=ϕ_t−||f_t||_F2e_t. (49)

This shows that the leave-one-out OLS covariance matrix estimate can be obtained from by corrections involving a scaled identity matrix and two rank-one updates. Eqn. (47) can be exploited to compute the closed-form LOOCV solution quickly. From (47), the most involved computation for finding the solution of the optimization problem (15) can be implemented as

 1T∑_t=1Ttr(R_t2) =tr(R2)+N∑_i=1Tδ_t2T−2∑_i=1Tδ_tTtr(R) +1T∑_i=1T||e_t||_F2(||ϕ_t||_F2+||ψ_t||_F2) −2T∑_i=1TR(e_t†(R−δ_tI)(ϕ_t+ψ_t)−e_t†ψ_te_t†ϕ_t), (50)

where denotes the real part of a scalar. When is already computed, the right-hand side of (II-D) can be evaluated using inner products and matrix-vector products. The terms and are the same as those for SCM. For the other two terms, we have

 1T∑_t=1Ttr(R_tS_t) =1Ttr(RYY†) −1T∑_t=1TR(δ_t∥∥y_t∥∥_F2+y_t†e_tϕ_t†y_t+y_t†ψ_te_t†y_t), (51)
 1T∑_t=1Ttr(R_tT_0) =tr(RT_0)−∑_t=1Tδ_tTtr(T_0) −1T∑_t=1TR(ϕ_t†T_0e_t+e_t†T_0ψ_t). (52)

Note that the computational complexities of (II-D)-(II-D) are low because the major operations are matrix-vector products and inner products.

### Ii-E Comparisons with Alternative Choices of Linear Shrinkage Coefficients

In the above, we have introduced LOOCV methods with analytical solutions for choosing the coefficients for linear shrinkage covariance matrix estimators. We now discuss several alternative techniques which have received considerable attentions recently and compare them with the LOOCV methods proposed in this paper.

In 2004, Ledoit and Wolf (LW) [2] studied estimators that shrink SCM toward an identity target, i.e.,

. Such estimators do not alter the eigenvectors but shrink eigenvalues of the SCM, which is well supported by the fact that sample eigenvalues tend to be more spread than population eigenvalues. The optimal shrinkage coefficients under the MMSE criterion (

3) can be written as

 ρ⋆=α2δ2,τ⋆=β2δ2μ, (53)

where the parameters , , , and depend on the true covariance matrix and other unknown statistics. Ref. [2] shows that and proposes to approximate these quantities by their asymptotic estimates under , , as

 ˆμ=tr(R)N,ˆδ2=∥R−ˆμI∥2_F, (54)
 (55)

which can all be computed from the training samples. By substituting these into (53), estimators that significantly outperform SCM are obtained, which also approach the oracle estimators when the training length is large enough.

The above LW estimator is extended by Stoica et al in 2008 [3] for complex-valued signals with general shrinkage targets , with applications to knowledge-aided space-time adaptive processing (KA-STAP) in radar applications. Several estimators with similar performance are derived there. For the general linear combination (GLC) design of [3], it is shown that the oracle shrinkage coefficients for (1) satisfy

 ρ⋆=1−τ⋆ν, (56)

where

 ν=tr(T_0Σ)∥T_0∥_F2,τ⋆=νβ2E[∥R−νT_0∥_F2]. (57)

The quantity is estimated in the same way as (55), and a computationally efficient expression for is given by

 ˆβ2=1T2∑_t=1T∥∥y_t∥∥4_F−1T∥R∥2_F. (58)

Furthermore, and are estimated as and , respectively. This leads to the result given by Eqns. (34) and (35) of [3], which can recover the LW estimator [2] when the identity shrinkage target is assumed.

More recently, Chen et al [4] derived the oracle approximating shrinkage (OAS) estimator, which assumes SCM, real-valued Gaussian samples, and scaled identity target with and . They first derive the oracle shrinkage coefficients for SCM obtained from i.i.d. Gaussian samples, which is determined by and . Then, they propose an iterative procedure to approach the oracle estimator. In the iterations, and are estimated by and , respectively, where is the covariance matrix estimate at the -th iteration. It is further proved that converges to the OAS estimator with the following analytical expression for :

 τ_OAS⋆=min⎛⎜ ⎜⎝1,(1−2N)tr(R2)+(tr(R))2(T+1−2N)[tr(R2)−(tr(R))2N]⎞⎟ ⎟⎠. (59)

This approach achieves superior performance for (scaled) identity target and Gaussian data and dominates the LW estimator [2] when is small. It was later generalized by Senneret et al [20] to a shrinkage target chosen as the diagonal entries of the SCM. Other related techniques include [14], which also assumes SCM, Gaussian data, and identity/diagonal shrinkage targets.

All the above techniques provide analytical solutions and achieve near-oracle performance when the underlying assumptions (e.g., large dimensionality, large size of training data, identity/diagonal shrinkage targets) hold. However, they also have limitations. A common restriction is that all these analytical solutions assume SCM and are not optimized for other types of covariance matrix estimators such as model-based estimators. In particular, the LW and GLC methods [2, 3], which employ asymptotic approximations, may exhibit a noticeable gap to the oracle choice when the sample support is low, which may be relevant in some applications. The OAS method [4] assumed identity target, but its extensions to more general cases, e.g., with multiple/general shrinkage targets, are not trivial. By contrast, the LOOCV method proposed in this paper allows different designs and achieves near-oracle performance in general.

Cross-validation has also been applied previously for choosing shrinkage coefficients for covariance matrix estimation. The key issues for applying this generic tool include finding appropriate predictive metrics for scoring the different estimators and fast computation schemes. In [10, 13], the Gaussian likelihood was chosen as such a proxy. The computations with likelihood are generally involved as multiple matrix inverses/determinants are required, and a grid search is required for finding the optimal parameters. In this paper, we use the distribution-free, Frobenius norm loss in (12) as the metric, which leads to analytical solutions and is computationally more tractable.

## Iii Multi-Target Shrinkage

In Section 2, we have considered linear shrinkage designs with a single target. Multiple shrinkage targets may be used to further enhance performance, which may be obtained from a priori knowledge, e.g., a past covariance matrix estimate from older training samples or from neighboring frequencies. We can easily extend our proposed LOOCV method to multiple targets.

### Iii-a Oracle choice of shrinkage coefficients

Consider the multi-target shrinkage design

 ˆΣ_ρ,τ=ρR+∑_k=1Kτ_kT_k, (60)

where all the shrinkage coefficients are nonnegative to guarantee PSD covariance matrix estimates, i.e.,

 ρ≥0;τ_k≥0,∀k. (61)

The oracle multi-target shrinkage minimizes the squared Frobenius norm of the estimation error

 J_O,MT(ρ,τ)=∥∥∥ρR+∑_k=1Kτ_kT_k−Σ∥∥∥_F2, (62)

which can be rewritten as

 J_O,MT(ρ,τ)=[ρτ]TA_O,MT[ρτ]−2[ρτ]Tb_O,MT+tr(Σ2), (63)

where ,

 A_O,MT=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣tr(R2)tr(RT_1)⋯tr(RT_K)tr(T_1R)tr(T_12)⋯tr(T_1T_K)⋮⋮⋱⋮tr(T_KR)tr(T_KT_1)⋯tr(T_K2)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (64)
 (65)

The oracle shrinkage coefficients can then be obtained by solving the problem of minimizing the cost function of (63), which is a strictly convex quadratic program (SCQP) with variables.

### Iii-B LOOCV choice of shrinkage coefficients

We now extend the LOOCV method in Section 2 to the multi-target shrinkage here. Following the same treatment as in Section II-B, in each split of the training data, and are constructed to generate and validate the covariance matrix estimate, respectively. The multiple shrinkage coefficients are chosen to minimize the LOOCV cost

 J_CV,MT(ρ,τ)=1T∑_t=1T∥∥∥ρR_t+∑_k=1Kτ_kT_k−S_t∥∥∥_F2. (66)

The above cost function can be rewritten in a form similar to (15) as

 J_CV,MT(ρ,τ) =[ρτ]TA_CV,MT[ρτ]−2[ρτ]Tb_CV,MT +1T∑_t=1Ttr(S_t2) (67)

with

 A_CV,MT=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣∑_t=1Ttr(R_t2)T∑_t=1Ttr(R_tT_1)T⋯∑_t=1Ttr(R_tT_K)T∑_t=1Ttr(T_1R_t)Ttr(T_12)⋯tr(T_1T_K)⋮⋮⋱⋮∑_t=1Ttr(T_KR_t)Ttr(T_KT_1)⋯tr(T_K2)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (68)
 b_CV,MT=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1T∑_t=1Ttr(R_tS_t)1T∑_t=1Ttr(T_1S_t)⋮1T∑_t=1Ttr(T_KS_t)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (69)

The constant entries of and can be computed in the same way as for the single-target case. When is small, which is typically the case, the solution that minimizes the LOOCV cost can be found quickly using standard optimization tools. Alternatively, we may find first the global optimizer that ignores the nonnegative constraint by

 [ρ⋆_CV,MTτ⋆_CV,MT]=A_CV,MT−1b_CV,MT, (70)

and check if the nonnegative condition is satisfied. If a negative shrinkage coefficient is produced, we then consider the boundaries of , which are equivalent to removing a certain number of shrinkage targets from the shrinkage design. The solution can be found in exactly the same way as (70) but with fewer targets.

Similarly to the single-target case, we may also consider a constrained case, where the shrinkage targets have the same trace as the estimated covariance matrix , and

 ρ+∑_k=1Kτ_k=1. (71)

Then the LOOCV cost function can be rewritten as

 J_CV,MT(τ)=1T∑_t=1T∥∥∥∑_k=1Kτ_kA_kt