## I Introduction

Estimating the covariance matrix is a ubiquitous problem that arises in various fields such as signal processing, wireless communication, bioinformatics, and financial engineering [2, 3, 4]. It has been noticed that the covariance matrix in some applications naturally possesses some special structures. Exploiting the structure information in the estimation process usually implies a reduction in the number of parameters to be estimated, and thus is beneficial to improving the estimation accuracy [5]. Various types of structures have been studied. For example, the Toeplitz structure with applications in time series analysis and array signal processing was considered in [6, 5, 7]. A sparse graphical model was studied in [8], where sparsity was imposed on the inverse of the covariance matrix. Banding or tapering the sample covariance matrix was proposed in [3]. A spiked covariance structure, which is closely related to the problem of component analysis and subspace estimation, was introduced in [9]. Other structures such as group symmetry and the Kronecker structure were considered in [10, 11, 12].

While the previously mentioned works have shown that enforcing a prior structure on the covariance estimator improves its performance in many applications, most of them either assume that the samples follow a Gaussian distribution or attempt to regularize the sample covariance matrix. It has been realized that the sample covariance matrix, which turns out to be the maximum likelihood estimator of the covariance matrix when the samples are assumed to be independent identically normally distributed, performs poorly in many real-world applications. A major factor that causes the problem is that the distribution of a real-world data set is often heavy-tailed or contains outliers. In this case, a single erroneous observation can lead to a completely unreliable estimate

[13].A way to address the aforementioned problem is to find a robust structured covariance matrix estimator that performs well even if the underlying distribution deviates from the Gaussian assumption. One approach is to refer to the minimax principle and seek the “best” estimate of the covariance for the worst case noise. To be precise, the underlying probability distribution of the samples

is assumed to belong to an uncertainty set of functionsthat contains the Gaussian distribution, and the desired minimax robust estimator is the one whose maximum asymptotic variance over the set

is less than that of any other estimator. Two types of uncertainty sets , namely the -contamination and the Kolmogorov class, were considered in [14], where a structured maximum likelihood type estimate (M-estimate) was derived as the solution of a constrained optimization problem. The uncertainty set that we are interested in is the family of elliptically symmetric distributions. It was proved by Tyler in [15] that given -dimensional independent and identically distributed (i.i.d.) samples drawn from an elliptical distribution, the Tyler’s estimator defined as the solution to the fixed-point equationis a minimax robust estimator. Additionally, it is “distribution-free” in the sense that its asymptotic variance does not depend on the parametric form of the underlying distribution.

The problem of obtaining a structured Tyler’s estimator was investigated in the recent works [16] and [17]. In particular, the authors of [16] focused on the group symmetry structure and proved that it is geodesically convex. As the Tyler’s estimator can be defined alternatively as the minimizer of a cost function that is also geodesically convex, it is concluded that any local minimum of the cost function on a group symmetry constraint set is a global minimum. A numerical algorithm was also proposed to solve the constrained minimization problem. In [17]

, a convex structural constraint set was studied and a generalized method of moments type covariance estimator, COCA, was proposed. A numerical algorithm was also provided based on semidefinite relaxation. It was proved that COCA is an asymptotically consistent estimator. However, the algorithm suffers from the drawback that the computational cost increases as either

or grows.In this paper, we formulate the structured covariance estimation problem as the minimization of Tyler’s cost function under the structural constraint. Our work generalizes [16] by considering a much larger family of structures, which includes the group symmetry structure. Instead of attempting to obtain a global optimal solution, which is a challenging task due to the non-convexity of the objective function, we focus on devising algorithms that converge to a stationary point of the problem. We first work out an algorithm framework for the general convex structural constraint based on the majorization minimization (MM) framework, where a sequence of convex programming is required to be solved. Then we consider several special cases that appear frequently in practical applications. By exploiting specific problem structures, the algorithm is particularized, significantly reducing the computational load. We further discuss in the end two types of widely studied non-convex structures that turn out to be computationally tractable under the MM framework; one of them being the Kronecker structure and the other one being the spiked covariance structure. Although theoretically the algorithms can only be proved to converge to a stationary point, for all the specific structures that are considered in this paper, it is observed that the proposed algorithms converge to a unique point (in ) with random initialization in the numerical simulations.

The paper is organized as follows. In Section II, we introduce the robust covariance estimation problem with structural constraint and derive a majorization minimization based algorithm framework for the general convex structure. Several special cases are considered in Section III, where the algorithm is particularized obtaining higher efficiency by considering the specific form of the structure. Section IV discusses the Kronecker structure and the spiked covariance structure, which are non-convex but algorithmically tractable. Numerical results are presented in Section V and we conclude in Section VI.

## Ii Tyler’s Estimator with Structural Constraint

Consider a number of samples in drawn independently from an elliptical underlying distribution with density function as follows:

(1) |

where is the scatter parameter that is proportional to the covariance matrix if it exists, and characterizes the shape of the distribution. Tyler’s estimator for is defined as the solution to the following fixed-point equation:

(2) |

which can be interpreted as a weighted sum of rank one matrices with the weight decreasing as gets farther from the center. It is known that if

is elliptically distributed, then the normalized random variable

follows an angular central Gaussian distribution with the probability density function (pdf) taking the form

(3) |

Tyler’s estimator coincides with the maximum likelihood estimator (MLE) of by fitting the normalized samples to . In other words, the estimator is the minimizer of the following cost function

(4) |

on the positive definite cone . The estimator is proved to be consistent and asymptotically normal with the variance independent of . Furthermore, it is a minimax robust covariance estimator with the underlying distribution being elliptically symmetric [18].

It has been noticed that in some applications, the covariance matrix possesses a certain structure and taking account this information into the estimation yields a better estimate of [11, 10, 14, 12]. Motivated by this idea, we focus on the problem of including a prior structure information into the Tyler’s estimator to improve its estimation accuracy. To formulate the problem, we assume that is constrained in a non-empty set that is the intersection of a closed set, which characterizes the covariance structure, and the positive semidefinite cone , and then proceed to solve the optimization problem:

(5) | ||||||

subject to |

The minimizer of the above problem is the one in the structural set that maximizes the likelihood of the normalized samples .

Throughout the paper, we make the following assumption.

Assumption 1: The cost function
when the sequence tends to a singular
limit point of the constraint set .

Under this assumption, the case that is singular can be excluded in the analysis of the algorithms hereafter.

Note that the assumption

Assumption 2: is a continuous probability
distribution, and ,

implies whenever tends to the boundary of the positive semidefinite cone with probability one [19] . It is therefore also a sufficient condition for the assumption to be held as .

Problem (5) is difficult to solve for two reasons. First, the constraint set is too general to tackle. Second, even if possesses a nice property such as convexity, the objective function is still non-convex. Instead of trying to find the global minimizer, which appears to be too ambitious for the reasons pointed out above, we aim at devising efficient algorithms that are capable of finding a stationary point of (5). We rely on the MM framework to derive the algorithms, which is briefly stated next for completeness.

### Ii-a The Majorization Minimization Algorithm

For a general optimization problem

(6) | ||||||

subject to |

where is a closed convex set, the MM algorithm finds a stationary point of (6) by successively solving a sequence of simpler optimization problems. The iterative algorithm starts at some arbitrary feasible initial point , and at the -th iteration the update of is given by

with the surrogate function satisfying the following assumptions:

(7) | |||||

where stands for the directional derivative of at along the direction , and is continuous in both and .

It is proved in [20] that any limit point of the sequence generated by the MM algorithm is a stationary point of problem (6). If it is further assumed that the initial level set is compact, then a stronger statement, as follows, can be made:

where stands for the set of all stationary points of (6).

The idea of majorizing by a surrogate function can also be applied blockwise. Specifically, is partitioned into blocks as , where each -dimensional block and .

At the -th iteration, is updated by solving the following problem:

(8) | ||||||

subject to |

with and the continuous surrogate function satisfying the following properties:

In short, at each iteration, the block MM applies the ordinary MM algorithm to one block while keeping the value of the other blocks fixed. The blocks are updated in cyclic order.

In the rest of this paper, we are going to derive the specific form of the surrogate function based on a detailed characterization of various kinds of . In addition, we are going to show how the algorithm can be particularized at a lower computational cost with a finer structure of available. Before moving to the algorithmic part, we first compare our formulation with several related works in the literature.

### Ii-B Related Works

In [14], the authors derived a minimax robust covariance estimator assuming that is a corrupted Gaussian distribution with noise that belongs to the -contamination class and the Kolmogorov class. The estimator is defined as the solution of a constrained optimization problem similar to (5), but with a different cost function. Apart from the distinction that the family of distributions we consider is the set of elliptical distributions, the focus of our work, which completely differs from [14], is on developing efficient numerical algorithms for different types of structural constraint set .

Two other closely related works are [16] and [17]. In [16], the authors have investigated a special case of (5), where is the set of all positive semidefinite matrices with group symmetry structure. It has been shown that both and the group symmetry constraint are geodesically convex, therefore any local minimizer of (5) is global. Several examples, including the circulant and persymmetry structure, have been proven to be a special case of the group symmetry constraint. A numerical algorithm has also been provided that decreases the cost function monotonically. Our work includes the group symmetry structure as a special case since the constraint is linear, and provides an alternative algorithm to solve the problem.

In [17], the authors have considered imposing convex constraint on Tyler’s estimator. A generalized method of moment type estimator based on semidefinite relaxation defined as the solution of the following problem:

(9) | ||||||

subject to | ||||||

was proposed and proved to be asymptotically consistent. Nevertheless, the number of constraints grows linearly in and as it was pointed out in the paper, the algorithm becomes computationally demanding either when the problem dimension or the number of samples is large. On the contrary, our algorithm based on formulation (5) is less affected by the number of samples and is therefore more computationally tractable.

## Iii Tyler’s Estimator with Convex structural constraint

In this section, we are going to derive a general algorithm for problem (5) with being a closed convex subset of , which enjoys a wide range of applications. For instance, the Toeplitz structure can be imposed on the covariance matrix of the received signal in direction-of-arrival estimation (DOA) problems. Banding is also considered as a way of regularizing a covariance matrix whose entries decay fast as they get far away from the main diagonal.

Since is closed and convex, constructing a convex surrogate function for turns out to be a natural idea since then can be found via

(10) |

which is a convex programming.

###### Proposition 1.

At any , the objective function can be upperbounded by the convex surrogate function

(11) |

with equality achieved at .

###### Proof:

Since is concave, can be upperbounded by its first order Taylor expansion at :

(12) |

with equality achieved at .

Also, by the concavity of the function we have

(13) |

which leads to the bound

with equality achieved at . ∎

By the convergence result of the MM algorithm, it can be concluded that every limit point of the sequence is a stationary point of problem (5). Note that for all of the structural constraints that we are going to consider in this work, the set possesses the property that

(14) |

Since the cost function is scale-invariant in the sense that , we can add a trace normalization step after the update of without affecting the value of the objective function. The algorithm for a general convex structural set is summarized in Algorithm 1.

(15) | |||||

(16) |

###### Proposition 2.

###### Proof:

### Iii-a General Linear Structure

In this subsection we further assume that the set is the intersection of and an affine set . The following lemma shows that in this case, the update of (eqn. (15)) can be recast as an SDP.

###### Lemma 3.

###### Proof:

Problem (15) can be written equivalently as

subject to |

Now we relax the constraint as . By the Schur complement lemma for a positive semidefinite matrix, if , then is equivalent to

The relaxation is tight since if and . ∎

## Iv Tyler’s Estimator with Special Convex Structures

Having introduced the general algorithm framework for a convex structure in the previous section, we are going to discuss in detail some convex structures that arise frequently in signal processing related fields, and show that by exploiting the problem structure the algorithm can be particularized with a significant reduction in the computational load.

### Iv-a Sum of Rank-One Matrices Structure

The structure set that we study in this part is

(21) |

where the

’s are known vectors in

. The matrix can be interpreted as a weighted sum of given matrices .As an example application where structure (21) appears, consider the following signal model

(22) |

where . Assuming that the signal and noise are zero-mean random variables and any two elements of them are uncorrelated, then the covariance matrix of takes the form

(23) |

where is the signal variance and is the noise covariance matrix.

Define and , then can be written compactly as . Further define

(24) | ||||

then . Therefore, without loss of generality, we can focus on the expression , assuming that every columns of are linearly independent and .

Note that in example (22), is complex-valued and problem (5), which is formulated based on the real-valued elliptical distribution , needs to be modified to

(25) | ||||||

subject to |

where the ’s are assumed follow a complex-valued elliptical distribution instead.

Since is linear in the ’s, Algorithm 1 can be applied. In the following, we are going to provide a more efficient algorithm by substituting into the objective function and applying the MM procedure with being the variable.

###### Proposition 4.

At any , the objective function

(26) |

can be upperbounded by the surrogate function

(27) |

with equality achieved at , where stands for the element-wise inverse of , and

(28) | ||||

###### Proof:

Assume that , from the identity

we know that . By the Schur complement, is equivalent to

(30) |

Since , we have

(31) |

with equality achieved at .

The update of then can be found in closed-form as

(32) |

The algorithm is summarized in Algorithm 2.

### Iv-B Toeplitz Structure

Consider the constraint set being the class of real-valued positive
semidefinite Toeplitz matrices . If , then
it can be completely determined by its first row^{1}^{1}1Following the convention, the indices for the Toeplitz structure start
from . .

In this subsection, we are going to show that based on the technique of circulant embedding, Algorithm 2 can be adopted to solve the Toeplitz structure constrained problem at a lower cost than applying the sequential SDP algorithm (Algorithm 1).

The idea of embedding a Toeplitz matrix as the upper-left part of a larger circulant matrix has been discussed in [5, 7, 22]. It was proved in [6] that any positive definite Toeplitz matrix of size can be embedded in a positive definite circulant matrix of larger size parametrized by its first row of the form

where denotes some real number. then can be written as

(33) |

Clearly, for any fixed , if is positive semidefinite, so is . However, the statement is false the other way around. In other words, the set

(34) |

where denotes the set of real-valued positive semidefinite circulant matrices of size , is a subset of .

Instead of , we restrict the feasible set to be with . Since a symmetric circulant matrix can be diagonalized by the Fourier matrix, if then it can be written as

(35) |

where

(36) |

with

being the normalized Fourier transform matrix of size

and .The robust covariance estimation problem over the restricted set of Toeplitz matrices then takes the form

(37) | ||||||

subject to | ||||||

which is the same as (25) except that the last equality constraint on the ’s.

By Proposition 4, the inner minimization problem takes the form

(38) | ||||||

subject to |

Note that by the property of the Fourier transform matrix, we have , where the upper bar stands for element-wise complex conjugate. As a result, for ,

(39) | ||||

which implies that the constraint will be satisfied automatically.

### Iv-C Banded Toeplitz Structure

In addition to imposing the Toeplitz structure on the covariance matrix, in some applications we can further require that the Toeplitz matrix is -banded, i.e., if . For example, the covariance matrix of a stationary moving average process of order satisfies the above assumption. One may also consider banding the covariance matrix if it is known in prior that the correlation of and decreases as increases.

Based on the circulant embedding technique introduced in the last subsection, the problem can be formulated as

(40) | ||||||

subject to | ||||||

By Proposition 4, the inner minimization problem becomes

(41) | ||||||

subject to | ||||||

which can be rewritten compactly as

(42) | ||||||

subject to | ||||||

Define real-valued quantities

(43) | |||||

(44) | |||||

(45) |

we have the equivalent problem

(46) | ||||||

subject to |

where the variables and are related by

(47) |

Compared to (42), the equivalent problem has a lower computational cost as both the number of variables and constraints are reduced. The algorithm for the banded Toeplitz structure is summarized in Algorithm 4.

### Iv-D Convergence Analysis

As Proposition 4 requires , we consider the following -approximation of problem (25):

(48) | ||||||

subject to |

with , where the upperbound derived in Proposition 4 can be applied for . Algorithm 2 can be easily modified to solve problem (48), and under Assumption 1, the limit point of the sequence generated by Algorithm 2 converges to the set of stationary points of (48).

That is, if is a limit point of , then

(49) |

for any feasible direction , where is the gradient of the objective function at .

###### Proposition 5.

Under Assumption 1, let be a positive sequence with , then any limit point of the sequence is a stationary point of problem (25).

###### Proof:

The conclusion follows from the continuity of in and under Assumption 2. ∎

## V Tyler’s Estimator with Non-Convex Structure

In the previous sections we have proposed algorithms for Tyler’s estimator with a general convex structural constraint and discussed in detail some special cases. For the non-convex structure, the problem is more difficult to handle. In this section, we are going to introduce two popular non-convex structures that are tractable by applying the MM algorithm, namely the spiked covariance structure and the Kronecker structure.

### V-a The Spiked Covariance Structure

The term “spiked covariance” was introduced in [23] and refers to the covariance matrix model

(50) |

where is some integer that is less than , and the ’s are unknown orthonormal basis vectors. Note that although (50) and (23) share similar form, they differ from each other essentially since the ’s in (23) are known and are not necessarily orthogonal. The model is directly related to principle component analysis, subspace estimation, and also plays an important role in sensor array applications [4, 9]. This model, referred to as factor model, is also very popular in financial time series analysis [24].

The constrained optimization problem is formulated as

(51) | ||||||

where .

Applying the upperbound (13) for the second term in the objective function yields the following inner minimization problem:

(52) | ||||||

Although the problem is non-convex, a global minimizer can be found in closed-form as

(53) | |||||

where

are the sorted eigenvalues of matrix

and the’s are the associated eigenvectors

[25]. The algorithm for the spiked covariance structure is summarized in Algorithm 5.As the feasible set is not convex, the convergence statement of the MM algorithm in [20] needs to be modified as follows.

###### Proposition 6.

Any limit point generated by the algorithm satisfies

where stands for the tangent cone of at .

###### Proof:

The result follows by combining the standard convergence proof of the MM algorithm [20] and the necessity condition of being the global minimal of

Comments

There are no comments yet.