Robust Estimation of Structured Covariance Matrix for Heavy-Tailed Elliptical Distributions

06/17/2015
by   Ying Sun, et al.
0

This paper considers the problem of robustly estimating a structured covariance matrix with an elliptical underlying distribution with known mean. In applications where the covariance matrix naturally possesses a certain structure, taking the prior structure information into account in the estimation procedure is beneficial to improve the estimation accuracy. We propose incorporating the prior structure information into Tyler's M-estimator and formulate the problem as minimizing the cost function of Tyler's estimator under the prior structural constraint. First, the estimation under a general convex structural constraint is introduced with an efficient algorithm for finding the estimator derived based on the majorization minimization (MM) algorithm framework. Then, the algorithm is tailored to several special structures that enjoy a wide range of applications in signal processing related fields, namely, sum of rank-one matrices, Toeplitz, and banded Toeplitz structure. In addition, two types of non-convex structures, i.e., the Kronecker structure and the spiked covariance structure, are also discussed, where it is shown that simple algorithms can be derived under the guidelines of MM. Numerical results show that the proposed estimator achieves a smaller estimation error than the benchmark estimators at a lower computational cost.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/08/2019

Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

In this work we provide an estimator for the covariance matrix of a heav...
04/07/2014

Tyler's Covariance Matrix Estimator in Elliptical Models with Convex Structure

We address structured covariance estimation in elliptical distributions ...
03/06/2022

Robust Estimation of Covariance Matrices: Adversarial Contamination and Beyond

We consider the problem of estimating the covariance structure of a rand...
02/12/2016

Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation

The problem of estimating sparse eigenvectors of a symmetric matrix attr...
06/18/2013

Group Symmetry and non-Gaussian Covariance Estimation

We consider robust covariance estimation with group symmetry constraints...
07/11/2018

Robust relative error estimation

Relative error estimation has been recently used in regression analysis....
09/27/2019

Robust Factor Analysis Parameter Estimation

This paper considers the problem of robustly estimating the parameters o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Estimating the covariance matrix is a ubiquitous problem that arises in various fields such as signal processing, wireless communication, bioinformatics, and financial engineering [2, 3, 4]. It has been noticed that the covariance matrix in some applications naturally possesses some special structures. Exploiting the structure information in the estimation process usually implies a reduction in the number of parameters to be estimated, and thus is beneficial to improving the estimation accuracy [5]. Various types of structures have been studied. For example, the Toeplitz structure with applications in time series analysis and array signal processing was considered in [6, 5, 7]. A sparse graphical model was studied in [8], where sparsity was imposed on the inverse of the covariance matrix. Banding or tapering the sample covariance matrix was proposed in [3]. A spiked covariance structure, which is closely related to the problem of component analysis and subspace estimation, was introduced in [9]. Other structures such as group symmetry and the Kronecker structure were considered in [10, 11, 12].

While the previously mentioned works have shown that enforcing a prior structure on the covariance estimator improves its performance in many applications, most of them either assume that the samples follow a Gaussian distribution or attempt to regularize the sample covariance matrix. It has been realized that the sample covariance matrix, which turns out to be the maximum likelihood estimator of the covariance matrix when the samples are assumed to be independent identically normally distributed, performs poorly in many real-world applications. A major factor that causes the problem is that the distribution of a real-world data set is often heavy-tailed or contains outliers. In this case, a single erroneous observation can lead to a completely unreliable estimate

[13].

A way to address the aforementioned problem is to find a robust structured covariance matrix estimator that performs well even if the underlying distribution deviates from the Gaussian assumption. One approach is to refer to the minimax principle and seek the “best” estimate of the covariance for the worst case noise. To be precise, the underlying probability distribution of the samples

is assumed to belong to an uncertainty set of functions

that contains the Gaussian distribution, and the desired minimax robust estimator is the one whose maximum asymptotic variance over the set

is less than that of any other estimator. Two types of uncertainty sets , namely the -contamination and the Kolmogorov class, were considered in [14], where a structured maximum likelihood type estimate (M-estimate) was derived as the solution of a constrained optimization problem. The uncertainty set that we are interested in is the family of elliptically symmetric distributions. It was proved by Tyler in [15] that given -dimensional independent and identically distributed (i.i.d.) samples drawn from an elliptical distribution, the Tyler’s estimator defined as the solution to the fixed-point equation

is a minimax robust estimator. Additionally, it is “distribution-free” in the sense that its asymptotic variance does not depend on the parametric form of the underlying distribution.

The problem of obtaining a structured Tyler’s estimator was investigated in the recent works [16] and [17]. In particular, the authors of [16] focused on the group symmetry structure and proved that it is geodesically convex. As the Tyler’s estimator can be defined alternatively as the minimizer of a cost function that is also geodesically convex, it is concluded that any local minimum of the cost function on a group symmetry constraint set is a global minimum. A numerical algorithm was also proposed to solve the constrained minimization problem. In [17]

, a convex structural constraint set was studied and a generalized method of moments type covariance estimator, COCA, was proposed. A numerical algorithm was also provided based on semidefinite relaxation. It was proved that COCA is an asymptotically consistent estimator. However, the algorithm suffers from the drawback that the computational cost increases as either

or grows.

In this paper, we formulate the structured covariance estimation problem as the minimization of Tyler’s cost function under the structural constraint. Our work generalizes [16] by considering a much larger family of structures, which includes the group symmetry structure. Instead of attempting to obtain a global optimal solution, which is a challenging task due to the non-convexity of the objective function, we focus on devising algorithms that converge to a stationary point of the problem. We first work out an algorithm framework for the general convex structural constraint based on the majorization minimization (MM) framework, where a sequence of convex programming is required to be solved. Then we consider several special cases that appear frequently in practical applications. By exploiting specific problem structures, the algorithm is particularized, significantly reducing the computational load. We further discuss in the end two types of widely studied non-convex structures that turn out to be computationally tractable under the MM framework; one of them being the Kronecker structure and the other one being the spiked covariance structure. Although theoretically the algorithms can only be proved to converge to a stationary point, for all the specific structures that are considered in this paper, it is observed that the proposed algorithms converge to a unique point (in ) with random initialization in the numerical simulations.

The paper is organized as follows. In Section II, we introduce the robust covariance estimation problem with structural constraint and derive a majorization minimization based algorithm framework for the general convex structure. Several special cases are considered in Section III, where the algorithm is particularized obtaining higher efficiency by considering the specific form of the structure. Section IV discusses the Kronecker structure and the spiked covariance structure, which are non-convex but algorithmically tractable. Numerical results are presented in Section V and we conclude in Section VI.

Ii Tyler’s Estimator with Structural Constraint

Consider a number of samples in drawn independently from an elliptical underlying distribution with density function as follows:

(1)

where is the scatter parameter that is proportional to the covariance matrix if it exists, and characterizes the shape of the distribution. Tyler’s estimator for is defined as the solution to the following fixed-point equation:

(2)

which can be interpreted as a weighted sum of rank one matrices with the weight decreasing as gets farther from the center. It is known that if

is elliptically distributed, then the normalized random variable

follows an angular central Gaussian distribution with the probability density function (pdf) taking the form

(3)

Tyler’s estimator coincides with the maximum likelihood estimator (MLE) of by fitting the normalized samples to . In other words, the estimator is the minimizer of the following cost function

(4)

on the positive definite cone . The estimator is proved to be consistent and asymptotically normal with the variance independent of . Furthermore, it is a minimax robust covariance estimator with the underlying distribution being elliptically symmetric [18].

It has been noticed that in some applications, the covariance matrix possesses a certain structure and taking account this information into the estimation yields a better estimate of [11, 10, 14, 12]. Motivated by this idea, we focus on the problem of including a prior structure information into the Tyler’s estimator to improve its estimation accuracy. To formulate the problem, we assume that is constrained in a non-empty set that is the intersection of a closed set, which characterizes the covariance structure, and the positive semidefinite cone , and then proceed to solve the optimization problem:

(5)
subject to

The minimizer of the above problem is the one in the structural set that maximizes the likelihood of the normalized samples .

Throughout the paper, we make the following assumption.
Assumption 1: The cost function when the sequence tends to a singular limit point of the constraint set .

Under this assumption, the case that is singular can be excluded in the analysis of the algorithms hereafter.

Note that the assumption
Assumption 2: is a continuous probability distribution, and ,

implies whenever tends to the boundary of the positive semidefinite cone with probability one [19] . It is therefore also a sufficient condition for the assumption to be held as .

Problem (5) is difficult to solve for two reasons. First, the constraint set is too general to tackle. Second, even if possesses a nice property such as convexity, the objective function is still non-convex. Instead of trying to find the global minimizer, which appears to be too ambitious for the reasons pointed out above, we aim at devising efficient algorithms that are capable of finding a stationary point of (5). We rely on the MM framework to derive the algorithms, which is briefly stated next for completeness.

Ii-a The Majorization Minimization Algorithm

For a general optimization problem

(6)
subject to

where is a closed convex set, the MM algorithm finds a stationary point of (6) by successively solving a sequence of simpler optimization problems. The iterative algorithm starts at some arbitrary feasible initial point , and at the -th iteration the update of is given by

with the surrogate function satisfying the following assumptions:

(7)

where stands for the directional derivative of at along the direction , and is continuous in both and .

It is proved in [20] that any limit point of the sequence generated by the MM algorithm is a stationary point of problem (6). If it is further assumed that the initial level set is compact, then a stronger statement, as follows, can be made:

where stands for the set of all stationary points of (6).

The idea of majorizing by a surrogate function can also be applied blockwise. Specifically, is partitioned into blocks as , where each -dimensional block and .

At the -th iteration, is updated by solving the following problem:

(8)
subject to

with and the continuous surrogate function satisfying the following properties:

In short, at each iteration, the block MM applies the ordinary MM algorithm to one block while keeping the value of the other blocks fixed. The blocks are updated in cyclic order.

In the rest of this paper, we are going to derive the specific form of the surrogate function based on a detailed characterization of various kinds of . In addition, we are going to show how the algorithm can be particularized at a lower computational cost with a finer structure of available. Before moving to the algorithmic part, we first compare our formulation with several related works in the literature.

Ii-B Related Works

In [14], the authors derived a minimax robust covariance estimator assuming that is a corrupted Gaussian distribution with noise that belongs to the -contamination class and the Kolmogorov class. The estimator is defined as the solution of a constrained optimization problem similar to (5), but with a different cost function. Apart from the distinction that the family of distributions we consider is the set of elliptical distributions, the focus of our work, which completely differs from [14], is on developing efficient numerical algorithms for different types of structural constraint set .

Two other closely related works are [16] and [17]. In [16], the authors have investigated a special case of (5), where is the set of all positive semidefinite matrices with group symmetry structure. It has been shown that both and the group symmetry constraint are geodesically convex, therefore any local minimizer of (5) is global. Several examples, including the circulant and persymmetry structure, have been proven to be a special case of the group symmetry constraint. A numerical algorithm has also been provided that decreases the cost function monotonically. Our work includes the group symmetry structure as a special case since the constraint is linear, and provides an alternative algorithm to solve the problem.

In [17], the authors have considered imposing convex constraint on Tyler’s estimator. A generalized method of moment type estimator based on semidefinite relaxation defined as the solution of the following problem:

(9)
subject to

was proposed and proved to be asymptotically consistent. Nevertheless, the number of constraints grows linearly in and as it was pointed out in the paper, the algorithm becomes computationally demanding either when the problem dimension or the number of samples is large. On the contrary, our algorithm based on formulation (5) is less affected by the number of samples and is therefore more computationally tractable.

Iii Tyler’s Estimator with Convex structural constraint

In this section, we are going to derive a general algorithm for problem (5) with being a closed convex subset of , which enjoys a wide range of applications. For instance, the Toeplitz structure can be imposed on the covariance matrix of the received signal in direction-of-arrival estimation (DOA) problems. Banding is also considered as a way of regularizing a covariance matrix whose entries decay fast as they get far away from the main diagonal.

Since is closed and convex, constructing a convex surrogate function for turns out to be a natural idea since then can be found via

(10)

which is a convex programming.

Proposition 1.

At any , the objective function can be upperbounded by the convex surrogate function

(11)

with equality achieved at .

Proof:

Since is concave, can be upperbounded by its first order Taylor expansion at :

(12)

with equality achieved at .

Also, by the concavity of the function we have

(13)

which leads to the bound

with equality achieved at . ∎

The variable then can be updated as (10) with surrogate function (11).

By the convergence result of the MM algorithm, it can be concluded that every limit point of the sequence is a stationary point of problem (5). Note that for all of the structural constraints that we are going to consider in this work, the set possesses the property that

(14)

Since the cost function is scale-invariant in the sense that , we can add a trace normalization step after the update of without affecting the value of the objective function. The algorithm for a general convex structural set is summarized in Algorithm 1.

1:Set , initialize to be any positive definite matrix.
2:repeat
3:     Compute .
4:     Update as
(15)
(16)
5:     .
6:until Some convergence criterion is met
Algorithm 1 Robust covariance estimation under convex structure
Proposition 2.

If the set satisfies (14), then the sequence generated by Algorithm 1 satisfies

(17)

where is the set of stationary points of problem (5).

Proof:

Since the objective function is scale-invariant, and the constraint set satisfies (14), solving (5) is equivalent to solving

subject to

The conclusion follows by a similar argument to Proposition 17 in [21]. ∎

Iii-a General Linear Structure

In this subsection we further assume that the set is the intersection of and an affine set . The following lemma shows that in this case, the update of (eqn. (15)) can be recast as an SDP.

Lemma 3.

Problem (15) is equivalent to

(18)
subject to

in the sense that if solves (18), then solves (15).

Proof:

Problem (15) can be written equivalently as

subject to

Now we relax the constraint as . By the Schur complement lemma for a positive semidefinite matrix, if , then is equivalent to

Therefore (18) is a convex relaxation of (15).

The relaxation is tight since if and . ∎

Lemma 3 reveals that for linear structural constraint, Algorithm 1 can be particularized as solving a sequence of SDPs.

An application is the case that can be parametrized as

(19)

with being the variable and being the corresponding given basis matrix, and is constrained to be in . Using expression (19), the minimization problem (18) can be simplified as

(20)
subject to

Iv Tyler’s Estimator with Special Convex Structures

Having introduced the general algorithm framework for a convex structure in the previous section, we are going to discuss in detail some convex structures that arise frequently in signal processing related fields, and show that by exploiting the problem structure the algorithm can be particularized with a significant reduction in the computational load.

Iv-a Sum of Rank-One Matrices Structure

The structure set that we study in this part is

(21)

where the

’s are known vectors in

. The matrix can be interpreted as a weighted sum of given matrices .

As an example application where structure (21) appears, consider the following signal model

(22)

where . Assuming that the signal and noise are zero-mean random variables and any two elements of them are uncorrelated, then the covariance matrix of takes the form

(23)

where is the signal variance and is the noise covariance matrix.

Define and , then can be written compactly as . Further define

(24)

then . Therefore, without loss of generality, we can focus on the expression , assuming that every columns of are linearly independent and .

Note that in example (22), is complex-valued and problem (5), which is formulated based on the real-valued elliptical distribution , needs to be modified to

(25)
subject to

where the ’s are assumed follow a complex-valued elliptical distribution instead.

Since is linear in the ’s, Algorithm 1 can be applied. In the following, we are going to provide a more efficient algorithm by substituting into the objective function and applying the MM procedure with being the variable.

Proposition 4.

At any , the objective function

(26)

can be upperbounded by the surrogate function

(27)

with equality achieved at , where stands for the element-wise inverse of , and

(28)
Proof:

First, observe that inequalities (12) and (13) imply that

(29)

with equality achieved at .

Assume that , from the identity

we know that . By the Schur complement, is equivalent to

(30)

Since , we have

(31)

with equality achieved at .

Since , the left hand side of (31) is finite. Therefore (31) is also valid for . Substituting (31) into (29) yields the surrogate function (27). ∎

The update of then can be found in closed-form as

(32)

The algorithm is summarized in Algorithm 2.

Compared to Algorithm 1, in which the minimization problem (10) has no closed-form solution and typically requires an iterative algorithm, the new algorithm only requires a single loop iteration in and is expected to converge faster.

1:Set , initialize to be any positive vector.
2:repeat
3:     ,
4:     Compute , , with (28)
5:     
6:     .
7:until some convergence criterion is met
Algorithm 2 Robust covariance estimation under sum of rank-one matrices structure

Iv-B Toeplitz Structure

Consider the constraint set being the class of real-valued positive semidefinite Toeplitz matrices . If , then it can be completely determined by its first row111Following the convention, the indices for the Toeplitz structure start from . .

In this subsection, we are going to show that based on the technique of circulant embedding, Algorithm 2 can be adopted to solve the Toeplitz structure constrained problem at a lower cost than applying the sequential SDP algorithm (Algorithm 1).

The idea of embedding a Toeplitz matrix as the upper-left part of a larger circulant matrix has been discussed in [5, 7, 22]. It was proved in [6] that any positive definite Toeplitz matrix of size can be embedded in a positive definite circulant matrix of larger size parametrized by its first row of the form

where denotes some real number. then can be written as

(33)

Clearly, for any fixed , if is positive semidefinite, so is . However, the statement is false the other way around. In other words, the set

(34)

where denotes the set of real-valued positive semidefinite circulant matrices of size , is a subset of .

Instead of , we restrict the feasible set to be with . Since a symmetric circulant matrix can be diagonalized by the Fourier matrix, if then it can be written as

(35)

where

(36)

with

being the normalized Fourier transform matrix of size

and .

The robust covariance estimation problem over the restricted set of Toeplitz matrices then takes the form

(37)
subject to

which is the same as (25) except that the last equality constraint on the ’s.

By Proposition 4, the inner minimization problem takes the form

(38)
subject to

Note that by the property of the Fourier transform matrix, we have , where the upper bar stands for element-wise complex conjugate. As a result, for ,

(39)

which implies that the constraint will be satisfied automatically.

The algorithm for the Toeplitz structure based on circulant embedding is summarized in Algorithm 3. Notice that Algorithm 3 can be generalized easily to noisy observations by the augmented representation (24).

1:Set to be an integer such that .
2:Construct matrix
3:Call Algorithm 2.
Algorithm 3 Robust covariance estimation under the Toeplitz structure (Circulant Embedding)

Iv-C Banded Toeplitz Structure

In addition to imposing the Toeplitz structure on the covariance matrix, in some applications we can further require that the Toeplitz matrix is -banded, i.e., if . For example, the covariance matrix of a stationary moving average process of order satisfies the above assumption. One may also consider banding the covariance matrix if it is known in prior that the correlation of and decreases as increases.

Based on the circulant embedding technique introduced in the last subsection, the problem can be formulated as

(40)
subject to

By Proposition 4, the inner minimization problem becomes

(41)
subject to

which can be rewritten compactly as

(42)
subject to

Define real-valued quantities

(43)
(44)
(45)

we have the equivalent problem

(46)
subject to

where the variables and are related by

(47)

Compared to (42), the equivalent problem has a lower computational cost as both the number of variables and constraints are reduced. The algorithm for the banded Toeplitz structure is summarized in Algorithm 4.

1:Set to be an integer such that .
2:Construct matrix and with (43).
3:Set , initialize to be any positive vector.
4:repeat
5:     ,
6:     Compute , , with (28).
7:     Compute and with (44) and (45), and update as the minimizer of (46).
8:     Compute with (47),
9:     
10:until some convergence criterion is met
Algorithm 4 Robust covariance estimation under the Banded Toeplitz structure (Circulant Embedding)

Iv-D Convergence Analysis

We consider Algorithm 2, and the argument for Algorithms 3, and 4 would be similar.

As Proposition 4 requires , we consider the following -approximation of problem (25):

(48)
subject to

with , where the upperbound derived in Proposition 4 can be applied for . Algorithm 2 can be easily modified to solve problem (48), and under Assumption 1, the limit point of the sequence generated by Algorithm 2 converges to the set of stationary points of (48).

That is, if is a limit point of , then

(49)

for any feasible direction , where is the gradient of the objective function at .

Proposition 5.

Under Assumption 1, let be a positive sequence with , then any limit point of the sequence is a stationary point of problem (25).

Proof:

The conclusion follows from the continuity of in and under Assumption 2. ∎

In practice, as can be chosen as an arbitrarily small number, directly applying Algorithms 2, 3 and 4 or adapting them to solving the -approximation problem would be virtually the same.

V Tyler’s Estimator with Non-Convex Structure

In the previous sections we have proposed algorithms for Tyler’s estimator with a general convex structural constraint and discussed in detail some special cases. For the non-convex structure, the problem is more difficult to handle. In this section, we are going to introduce two popular non-convex structures that are tractable by applying the MM algorithm, namely the spiked covariance structure and the Kronecker structure.

V-a The Spiked Covariance Structure

The term “spiked covariance” was introduced in [23] and refers to the covariance matrix model

(50)

where is some integer that is less than , and the ’s are unknown orthonormal basis vectors. Note that although (50) and (23) share similar form, they differ from each other essentially since the ’s in (23) are known and are not necessarily orthogonal. The model is directly related to principle component analysis, subspace estimation, and also plays an important role in sensor array applications [4, 9]. This model, referred to as factor model, is also very popular in financial time series analysis [24].

The constrained optimization problem is formulated as

(51)

where .

Applying the upperbound (13) for the second term in the objective function yields the following inner minimization problem:

(52)

Although the problem is non-convex, a global minimizer can be found in closed-form as

(53)

where

are the sorted eigenvalues of matrix

and the

’s are the associated eigenvectors

[25]. The algorithm for the spiked covariance structure is summarized in Algorithm 5.

1:Initialize to be an arbitrary feasible positive definite matrix.
2:repeat
3:     .
4:     Eigendecompose as , where .
5:     Compute , , with (53)
6:     
7:     
8:     .
9:until Some convergence criterion is met.
Algorithm 5 Robust covariance estimation under the spiked covariance structure

As the feasible set is not convex, the convergence statement of the MM algorithm in [20] needs to be modified as follows.

Proposition 6.

Any limit point generated by the algorithm satisfies

where stands for the tangent cone of at .

Proof:

The result follows by combining the standard convergence proof of the MM algorithm [20] and the necessity condition of being the global minimal of