I Introduction
Covariance or scatter matrix estimation is a fundamental problem in statistical signal processing [1, 2], with applications ranging from wireless communications [3] to financial engineering [4] and biology [5]. Historically, the sample covariance matrix (SCM) , where are zeromean data samples, has been a particularly appealing choice among possible estimators. The SCM is known to be the maximum likelihood estimator (MLE) of the covariance matrix when the are independent, identically distributed zeromean Gaussian observations, and its simple structure makes it easy to implement. Nonetheless, the SCM is known to suffer from three major drawbacks: first, it is not resilient to outliers nor samples of impulsive nature; second, it is a poor estimate of the true covariance matrix whenever the number of samples and the number of variables are of similar order; lastly, it is not invertible for . The sensitivity to outliers is a particularly important issue in radarrelated applications [6, 7], where the background noise usually follows a heavytailed distribution, often modeled as a complex elliptical distribution [8, 9]. In such cases, the MLE of the covariance matrix is no longer the SCM. On the other hand, data scarcity is a relevant issue in an evergrowing number of signal processing applications where and are generally of similar order, possibly with [4, 5, 10, 11]. New improved covariance estimators are needed to account for both potential data anomalies and highdimensional scenarios.
In order to harness the effect of outliers and thus provide a better inference of the true covariance matrix, robust estimators known as Mestimators have been designed [12, 13, 14, 9]. Their structure is nontrivial, involving matrix fixedpoint equations, and their analysis challenging. Nonetheless, significant progress towards understanding these estimators has been made in largedimensional settings [15, 16, 17, 18, 19], motivated by the increasing number of applications where are both large and comparable. Salient messages of these works are: (i) outliers or impulsive data can be handled by these estimators, if appropriately designed (the choice of the specific form of the estimator is important to handle different types of outliers) [19]; (ii) in the absence of outliers, robust Mestimators essentially behave as the SCM and, therefore, are still subject to the data scarcity issue [16].
To alleviate the issue of scarce data, regularized versions of the SCM have originally been proposed [20, 21]
. Such estimators consist of a linear combination of the SCM and a shrinkage target (often the identity matrix), which guarantees their invertibility, and often provides a large improvement in accuracy over the SCM when
and are of the same order. Nevertheless, the regularized SCM (RSCM) inherits the sensitivity of the SCM to outliers/heavytailed data. To alleviate both the data scarcity issue and the sensitivity to data anomalies, regularized Mestimators have been proposed [1, 15, 2, 22]. Such estimators are similar in spirit to the RSCM, in that they consist of a combination of a robust Mestimator and a shrinkage target. However, unlike the RSCM, but similar to the estimators studied in [16, 19], these estimators are only defined implicitly as the solution to a matrix fixedpoint equation, which makes their analysis particularly challenging.In this article, we propose to study these robust regularized estimators in the doubleasymptotic regime where and grow large together. Building upon recent works [17, 23], we will make use of random matrix theory tools to understand the yetunknown asymptotic behavior of these estimators and subsequently to establish design principles aimed at choosing appropriate estimators in different scenarios. In order to do so, we will first study the behavior of these regularized Mestimators in an outlierfree scenario. In this setting, we will show that, upon optimally choosing the regularization parameter, most Mestimators perform asymptotically the same, meaning that the form of the underlying Mestimator does not impact the performance of its regularized counterpart in clean data. Second, we will investigate the effect of the introduction of outliers in the data, under different contamination models. Initial insights were obtained in [19] for nonregularized estimators, focusing on the weights given by the Mestimator to outlying and legitimate data. However, the current study, by proposing an intuitive measure of robustness, takes a more formal approach to qualify the robustness of these estimators. In particular, we will demonstrate which form of Mestimators is preferable given a certain contamination model, first in the nonregularized setting, and then for regularized estimators.
Notation: , and denote the spectral norm, the Frobenius norm and the trace of the matrix , respectively. The superscript stands for Hermitian transpose. Thereafter, we will use
to denote the ordered eigenvalues of the square matrix
. The statement (resp. ) means that the symmetric matrix is positive definite (resp. positive semidefinite). The arrow designates almost sure convergence, while denotes the Dirac measure at point .Ii Review of the large dimensional behavior of nonregularized Mestimators
Iia General form of nonregularized Mestimators
In the nonregularized case, robust Mestimators of covariance matrices are defined as the solution (when it exists) to the equation in [13]
(1) 
where represents the data matrix, and where satisfies the following properties:

is a nonnegative, nonincreasing, bounded, and continuous function on ,

is increasing and bounded, with .
If welldefined, the solution of (1) can be obtained via an iterative procedure (see, for example, [24, 16]). Intuitively, the th data sample is given a weight , which should be smaller for outlying samples than for legitimate ones. The choice of the function determines the degree of robustness of the Mestimator. As a rule of thumb, the larger , the more robust the underlying Mestimator to potential extreme outliers [13]. However, such increased robustness is usually achieved at the expense of accuracy [9].
A related and commonlyused estimator is Tyler’s estimator [14], which is associated with the unbounded function . Recent papers dealing with the application of Tyler’s estimator in engineering include [25, 26, 27]. We remark however that, for such function, the existence of a solution to (1) depends on the data at hand (see, e.g., [14, 28, 22]). To avoid this issue, we here focus on a class of estimators with bounded functions (as prescribed above). Examples of practical interest, which we study in some detail, include
(2)  
(3) 
for some . For a specific , is known to be the MLE of the true covariance matrix when the
are independent, zeromean, multivariate Student vectors
[19], whereas refers to a modified form of the socalled Huber estimator [29]. Observe that for these functions, , such that the robustness of the associated Mestimator to extreme outliers is controlled by both and the scale factor . In what follows, with a slight abuse of terminology, we will refer to these estimators as “Tyler’s” and “Huber’s” estimators, respectively.IiB Asymptotic equivalent form under outlierfree data model
Assume now the following “outlier free” data model: let be dimensional data samples, drawn from , where is deterministic and
are random vectors, the entries of which are independent with zero mean, unit variance and finite
th order moment (for some
). With this model, we now recall the main result from [16].Theorem 1.
This shows that, up to a multiplying constant, regardless of the choice of , Maronna’s Mestimators behave (asymptotically) like the SCM. As such, in the absence of outliers, no information is lost.
However, Theorem 1 excludes the “undersampled” case . Regularized versions of Maronna’s Mestimators have been proposed to alleviate this issue, in most cases considering regularized versions of Tyler’s estimator () [1, 15, 28, 2], the behavior of which has been studied in [18, 17]. Recently, a regularized Mestimator which accounts for a wider class of functions has been introduced in [22], but its largedimensional behavior remains unknown. We address this in the next section. Moreover, note that Theorem 1 does not tell us anything about the behavior of different estimators, associated with different functions, in the presence of outlying or contaminating data. While progress to better understand the effect of outliers was recently made in [19], their study focused on nonregularized estimators. In this work, a new measure to characterize the robustness of different Mestimators will be proposed, allowing us to study both nonregularized and regularized estimators (Sections IV and V).
Iii Regularized Mestimators: Large dimensional analysis and calibration
Iiia General form of regularized Mestimators
We consider the class of regularized Mestimators introduced in [22], and given as the unique solution to
(4) 
where is a regularization (or shrinkage) parameter, and where denotes the identity matrix. The introduction of a regularization parameter allows for a solution to exist when . The structure of (4) strongly resembles that of the RSCM, defined as
(5) 
where , also referred to as linear shrinkage estimator [30], linear combination estimator [31], diagonal loading [21]
, or ridge regression
[32]. Regularized Mestimators are robust versions of the RSCM.IiiB Asymptotic equivalent form under outlierfree data model
Based on recent random matrix theory results, we now characterize the asymptotic behavior of these Mestimators. Under the same data model as that of Section II, we answer the basic question of whether (and to what extent) different regularized estimators, associated with different functions, are asymptotically equivalent. We need the following assumption on the growth regime and the underlying covariance matrix :
Assumption 1.

as .

.

satisfies weakly with almost everywhere.
Assumption 1 slightly differs from the assumptions of Theorem 1. In particular, the introduction of a regularization parameter now allows . Likewise, is now only required to be positive semidefinite.
For each , we denote by the unique solution to (4). We first characterize its behavior in the large regime. To this end, we need the following assumption:
Assumption 2.
.
We now introduce an additional function, which will be useful in characterizing a matrix equivalent to .
Definition. Let Assumption 2 hold. Define as where denotes the inverse function of , which maps onto .
The function is continuous, nonincreasing and onto. We remark that Assumption 2 guarantees that (and thus ) is properly defined^{1}^{1}1Assumption 2 could in fact be relaxed by considering instead the inequality , which therefore enforces a constraint on the choice of both the function (through ) and the regularization parameter . The proof of Theorem 2 (provided in Appendix) considers this more general case. Nevertheless, for simplicity of exposition, we will avoid this technical aspect in the core of the paper.. Note that, importantly, does not have to be lower bounded by 1, as opposed to the nonregularized setting. With this in hand, we have the following theorem:
Theorem 2.
Define a compact set included in . Let be the unique solution to (4). Then, as , under Assumptions 12,
where
with the unique positive solution to the equation
(6) 
Furthermore, the function is bounded, continuous on and greater than zero.
The proof of Theorem 2 (as well as that of the other technical results in this section) is provided in Appendix AA.
Remark 1.
The uniform convergence in Theorem 2 will be important for finding the optimal regularization parameter of a given estimator. As a matter of fact, the set , required to establish such uniform convergence, can be taken as in the oversampled case (provided that ).
Remark 2.
Note that Theorem 2 (and Propositions 1, 2 below) resemble similar results obtained in [17]. The key difference in the present work is that our results apply to a wide class of estimators (associated with functions satisfying the assumptions prescribed above), while [17] only focused on Tyler’s estimator (associated with ).
Theorem 2 shows that, for every function, the estimator asymptotically behaves (uniformly on ) like the RSCM, with weights in lieu of the parameters in (5). Importantly, the relative weight given to the SCM depends on the underlying function, which entails that, for a fixed , two different estimators may have different asymptotic behaviors. However, while this is indeed the case, in the following it will be shown that, upon properly choosing the regularization parameter, all regularized Mestimators share the same, optimal asymptotic performance, at least with respect to a quadratic loss.
IiiC Optimized regularization and asymptotic equivalence of different regularized Mestimators
First, we will demonstrate that any tracenormalized regularized Mestimator is in fact asymptotically equivalent to the RSCM, up to a simple transformation of the regularization parameter. The result is as follows:
Proposition 1.
Let Assumptions 12 hold. For each , the parameter defined as
(7) 
is such that
(8) 
where we recall that .
Proposition 1 implies that, in the absence of outliers, any (tracenormalized) estimator is equal to a tracenormalized RSCM estimator with a regularization parameter depending on and on the underlying function (through ). From Theorem 2, it then follows that the estimator asymptotically behaves like the RSCM estimator with parameter .
Thanks to Proposition 1, we may thus look for optimal asymptotic choices of . Given an estimator of , define the quadratic loss of the associated tracenormalized estimator as:
We then have the following proposition:
Proposition 2.
(Optimal regularization)
Let Assumptions 1 and 2 hold. Define
where and . Then,
Furthermore, for a solution to ,
(Optimal regularization parameter estimate)
The solution to
satisfies
Proposition 2 states that, irrespective of the choice of , there exists some for which the quadratic loss of the corresponding regularized Mestimator is minimal, this minimum being the same as the minimum achieved by an optimallyregularized RSCM. The last result of Proposition 2 provides a simple way to estimate this optimal parameter.
In the following, we validate these theoretical findings through simulation. Let , and consider the functions specified in (2) and (3), with and . For , Fig. 1 depicts the expected quadratic loss associated with the solution of (4) and that associated with the RSCM (line curves), along with the expected quadratic loss associated with the random equivalent of Tyler’s and Huber’s estimators (marker).
For both functions and all , there is a close match between the quadratic loss of and that of . This shows the accuracy of the (asymptotic) equivalence of and described in Theorem 2. As suggested by our analysis, while the estimators associated with different functions have different performances for a given , they have the same performance when is optimized, with a quadratic loss approaching for large. Furthermore, the optimal regularization parameter for a given function is accurately estimated, as shown by the arrows in Fig. 1.
Iv Largedimensional robustness: Nonregularized case
In this section, we turn to the case where the data is contaminated by random outliers and study the robustness of Mestimators for distinct functions. Some initial insight has been previously provided in [19] for the nonregularized case. Specifically, that study focused on the comparison of the weights given by the estimator to outlying and legitimate samples. Albeit insightful, the analysis in [19] did not directly assess robustness, understood as the impact of outliers on the estimator’s performance. Moreover, it is not clear how these weights translate into the Frobenius loss, or any other standard metric to measure the estimator’s performance. Here we propose a different approach to analyze robustness, by introducing and evaluating a robustness metric which measures the bias induced by data contamination.
We start by studying nonregularized estimators (thereby excluding the case ), which are technically easier to handle. This will provide insight on the capabilities of different Mestimators to harness outlying samples. Then, in the following section, we will investigate how this study translates to the regularized case. The proofs of the technical results in this section are provided in Appendix AB.
Iva Asymptotic equivalent form under outlier data model
We focus on a particular type of contamination model where outlying samples follow a distribution different from that of the legitimate samples. Similar to [19], the data matrix is constructed with the first data samples () being the legitimate data, and following the same distribution as in Sections II and III (that is, they verify ). The remaining “contaminating” data samples () are assumed to be random, independent of the , with , where is deterministic positive definite and are independent random vectors with i.i.d. zero mean, unit variance, and finite th order moment entries, for some . This outlier data model is effectively a cluster contamination model [33].
To characterize the asymptotic behavior of Mestimators for this data model, we require the following assumptions on the growth regime and on the underlying covariance matrices and :
Assumption 3.

and as .

.

.
Let us consider a function with now . For such a function, the equation in
(9) 
has a unique solution [13], hereafter referred to as . Moreover, define as where denotes the inverse function of , which maps onto .
In this setting, we have the following result:
Theorem 3.
Let Assumption 3 hold and let be the unique solution to (9). Then, as ,
where
with and the unique positive solutions to:
with
Theorem 3 shows that behaves similar to a weighted SCM, with the legitimate samples weighted by , and the outlying samples by . This result generalizes [19, Corollary 3] to allow for . Indeed, the applicability of the results of [19] was limited by a constraint on and , which had to verify the inequality (along with ). This constraint can be easily violated in practice, particularly for : for example, if , then having a proportion of outliers as little as would imply that , in which case there does not exist a verifying the inequality above.
A scenario which will be of particular interest in the following concerns the case where there is a vanishingly small proportion of outliers. This occurs when for some , in which case . For this scenario, the weights given to the legitimate and outlying data are
(10)  
(11) 
respectively.
In the following, we exploit the form of to characterize the effect of random outliers on the estimator .
IvB Robustness analysis
Let be the solution to (1), and the solution to (9).
We propose the following metric, termed measure of influence, to assess the robustness of a given estimator to an contamination of the data:
Definition 1.
For , the measure of influence is given by
Corollary 1.
As ,
where
(12) 
Note that , as expected. The result (12) shows that is globally influenced by , which is also an intuitive result, since it suggests that the more “different” is from , the higher the influence of the outliers on the estimator. To get clearer insight on the effect of a small proportion of outliers, assuming for some , we compute
(13) 
which we will refer to as the infinitesimal measure of influence (IMI)^{2}^{2}2As defined, the IMI is reminiscent of a standard robust statistical metric known as the influence function [34]. A key distinction with our proposed metric is that the influence function measures the effect of an infinitesimal mass contamination on the estimator robustness (here, is an arbitrary point). In contrast, the IMI focuses on the impact of an infinitesimal random contamination..
For particular functions, these general results reduce to even simpler forms: for example, for functions such that (such as or ), which entails , (14) further yields
Further, considering small, the IMI associated with and can be approximated as
(15) 
and
(16) 
Hence, when , , which shows that the influence of an infinitesimal fraction of outliers is higher for Tyler’s estimator than for Huber’s. In contrast, when , both Huber’s and Tyler’s estimators exhibit the same IMI.
For comparison, the measure of influence of the SCM can be written as
which is linear in . It follows immediately that
(17) 
The fact that is bounded may seem surprising, since it is known that a single arbitrary outlier can arbitrarily bias the SCM [34], however we recall that the current model focuses on a particular random outlier scenario. From (12), the SCM is more affected than given Mestimators by the introduction of outliers if and only if
This further legitimizes the study in [19], which focused on these weights to assess the robustness of a given Mestimator. However, in the regularized case it will be shown that the relationship between the relative weights and robustness is more complex (see Subsection VB).
Fig. 2 depicts the measure of influence for different functions, as the proportion of outlying samples increases. For every function ( or ), we take . In addition, we show the measure of influence of the SCM, as well as the linear approximation (computed using (15), (16) and (17)) of the measure of influence in the neighborhood of . We first set and (such that ), and then swap the roles of and (such that ).
In the case where , Fig. 2 confirms that the measure of influence of both Tyler’s and Huber’s estimators is lower than that of the SCM, as corroborated by the fact that (see (15), (16)). This shows that the considered Mestimators are more robust to the introduction of outliers than the SCM. Furthermore both Tyler’s and Huber’s estimators exhibit the same robustness for small . However, in the opposite case where , Tyler’s estimator is much less robust than both Huber’s estimator and the SCM, which are both equally robust (for small ). Since both and are unknown in practice, it suggests that choosing Huber’s estimator is preferable over Tyler’s.
V Large dimensional robustness: Regularized case
We now turn to the regularized case, which, in particular, allows . The proofs of the technical results in this section are provided in Appendix AC.
Va Asymptotic equivalent form under outlier data model
To facilitate the robustness study of regularized Mestimators, we start by analyzing the largedimensional behavior of these estimators in the presence of outliers.
For , where , we define the regularized estimator associated with the function as the unique solution to the equation in :
(18) 
Remark 3.
If we assume that , then the range of admissible is . Furthermore, if , we can in fact take . In the following, we assume that . Note that, similar to the outlierfree scenario, the introduction of a regularization parameter allows us to relax the assumption of .
Theorem 4.
VB Robustness analysis
Similar to the nonregularized case, we next make use of to study the robustness of . Importantly, introducing a regularization parameter entails an additional variable to consider when studying the robustness of Mestimators.
We denote by the solution to (18) for a given when there are no outliers. Similar to the nonregularized case, we define
By Theorem 4, we have the following corollary:
Corollary 2.
Unlike in the nonregularized case, the form of renders the analysis difficult in general. For the specific case however, for all . This is intuitive, and reflects the fact that the more we regularize an estimator, the more robust it becomes (eventually, it boils down to taking ). This extreme regularization, however, leads to a significant bias, and is therefore not desirable.
In the following, we focus on the infinitesimal measure of influence associated with , which is defined in a similar way as for the nonregularized case: assume for some (
Comments
There are no comments yet.