Estimation of population covariance matrices from samples of multivariate data has draw many attentions in the last decade owing to its fundamental importance in multivariate analysis. With dramatic advances in technology in recent years, various research fields, such as genetic data, brain imaging, spectroscopic imaging, climate data and so on, have been used to deal with massive high-dimensional data sets, whose sample sizes can be very small relative to dimension. In such settings, the standard and the most usual sample covariance matrices often performs poorly[1, 2, 11].
Fortunately, regularization as a class of new methods to estimate covariance matrices has recently emerged to overcome those shortages of using traditional sample covariance matrices. These methods encompass several specified forms, banding [1, 6, 17], tapering [4, 10] and thresholding [2, 5, 8, 16] for instance.
Moreover, there are many cases where the model is known to be structured in several ways at the same time. In recent years, one of research contents is to estimate a covariance matrix possessing both sparsity and positive definiteness. For instance, Rothman  gave the following model:
where is the sample covariance matrix, is the Frobenious norm, is the element-wise -norm and . From the optimization viewpoint, (1) is similar to the graphical lasso criterion  which also has a log-determinant part and the element-wise -penalty. Rothman  derived an iterative procedure to solve (1). While Xue, Ma and Zou  omitted the log-determinant part and considered the positive definite constraint for some arbitrarily small :
They utilized an efficient alternating direction method (ADM) to solve the challenging problem (2) and established its convergence properties.
Most of the literatures, e.g.,[12, 15, 18], required the population covariance matrices being positive definite, and thus there is no essence of pursuing the low-rank of the estimator. By contrast, newly appeared research topic is to consider simultaneously the sparsity and low-rank of a structured model, which implies that the population covariance matrices are no longer restricted to the positive definite matrix cone and can be relaxed to the positive semidefinite cone. In addition, the models with structure of being simultaneously the sparsity and low-rank are widely applied into practice, such as sparse signal recovery from quadratic measurements and sparse phase retrieval, see  for example. Moreover, Richard et al. 
showed that both sparse and low-rank model can be derived in covariance matrix when the random variables are highly correlated in groups, which means this covariance matrix has a block diagonal structure.
With stimulations of those ideas, we construct the following convex model encompassing the -norm and nuclear norm for estimating the covariance matrix:
where are tuning parameters. The -norm penalty is also called lasso-type penalty and is used to encourage sparse solutions. The nuclear norm , with
being the eigenvalue of, is the trace norm when and ensures low-rank solutions of (3). Here we inroduce the approximate rank to interpret the low-rank, which is defined as being the smallest number such that
are the singular value ofwith , and could be chosen based on the needs, throughout our paper we fix for simplicity.
The contributions of this paper mainly center on two aspects. For one thing, being different from [13, 14], we establish the theoretical statistical theory under different assumptions rather than giving the generalized error bound of the estimation. Especially, we acquire the estimation rate under the Frobenious norm error, which improves the optimal rate where the low-rank property of the estimator does not be considered [7, 15, 18] and is the samples’ dimension with . For another, we take advantage of the alternating direction method of multipliers (ADMM), also can be seen in [18, 19], to combat our problem (3).
The organization of this paper is as follows. In Section 2 we will present some theoretical properties of the estimator derived by the proposed model (3). After that the alternating direction method of multipliers (ADMM) is going to be introduced to combat the problem, and numerical experiments are projected to show the performance of this method in Sections 3 and 4 respectively. We make a conclusion in the last section.
2 A Sparse and Low-Rank Covariance Estimator
Before the main part, we hereafter introduce some notations. and denote the expectation of and the probability of the incident occurring respectively. is the number of entries of the set
. Normal distribution with meanand covariance is written as . Say if for every , there is a such that for all , and say if . If there are two constants such that , we write as .
For given observed independently and identically distributed (i.i.d. for short) -variate random variables with covariance matrix and , the goal is to estimate the unknown matrix based on the sample . This problem is called covariance matrix estimation which is of fundamental importance in multivariate analysis.
Given a random sample from (without loss of generality) and a population covariance matrix , the sample covariance matrix is
where . Denote the support set of the population covariance matrix as
For all , , where is a constant.
Let be i.i.d. and Assumption 2.1 holds, (i.e., for all , ). Then, if ,
where constants and depend on only.
Another lemma which plays an important role in our main results is stated below.
Suppose that Assumption 2.1 holds, . Then for sufficiently small,
where is defined as .
Proof First make the eigenvalue decomposition of () as
is the matrix composed of eigenvectors,is a diagonal matrix generated by eigenvalues with and .
By denoting with and , which implies , we consider the model
Clearly, from (3) we have which implies . For a given sufficiently small and any (i.e., ), we compute
For convenience we denote . Then for I, it holds
For II, we obtain by noting and that,
From the Hlder Inequality, one can prove that
For III, combining with (8) we get that
Since and (9),
Hence we prove that if and , for any , it holds
In addition, from (2), we have
which implies that . Otherwise, we suppose , then . Since for any , it follows which is contradicted with the fact is a convex function and , because
Finally indicates that due to . Hence the desired result is obtained.
Then in order to acquiring the rate of the estimation, the two following commonly used assumptions are needed to introduced, and also can be seen [1, 15]. Assumption 2.4 holds, for example, if are Gaussian.
hold for all and , where and are two constants.
hold for all , where some and is a constant.
Built on the two assumptions, we give our main results with regard to rates of the estimator of (3).
where and are some constants. We then apply the bound and Lemma 2.3 with
to obtain that
Evidently, can be arbitrarily close to one by choosing sufficiently large.
Clearly, if the in Assumption 2.1, the better rate would reduce to . It is worth mentioning that under the the Assumption 2.1, the minimax optimal rate of convergence under the Frobenius norm in Theorem 4 of  is which also has been obtained by . However, to attain the same rate in the presence of the log-determinant barrier term (1), Rothman  instead would require that , the minimal eigenvalue of the true covariance matrix, should be bounded away from zero by some positive constant, and also that the barrier parameter should be bounded by some positive quantity.  illustrated this theory requiring a lower bound on is not very appealing.
3 Alternating Direction Method of multipliers
The constraint can be put into the objective function by using an indicator function:
This leads to the following equivalent reformulation of (14):
where the augmented Lagrangian function is defined as
where denote the projection of a matrix onto the convex positive semidefinite cone . Namely , where and
The solution of the second subproblem (21) is given by the -shrinkage operation
where and is a sign function.
|ADMM: Alternating Direction Method of Multipliers|
To end this section, we prove that the sequence produced by the alternating direction method of multipliers (Table 1) converges to , where is an optimal solution of (15) and is the optimal dual variable. Now we label some necessary notations for the ease of presentation. Let be a matrix defined as
the weighted norm stands for and the corresponding inner product is . Before presenting the main theorem with regard to the global convergence of ADMM, we introduce the following lemma.
Assume that is an optimal solution of and is the corresponding optimal dual variable associated with the equality constraint . Then the sequence produced by ADMM satisfies
where and .
Based on the lemma above, the convergent theorem can be derived immediately.
The sequence generated by Algorithm 1 from any starting point converges to an optimal solution of .
4 Numerical Simulations
In this section we will exploit the proposed method ADMM to tackle two examples, one of which possessed the block structured population covariance matrix, and another utilized the banded population covariance matrix. Actually as the constraint , our proposed model (3) is equivalent to
So similar to the method in , one can solve the soft-thresholding estimator
to initialize the . If the derived then the recovered sparse and low-rank semidefinite estimator . In our stimulation, we uniformly initialize as the matrix with all entries being 1,
as zero matrix andrespectively. Unlike and , does not change the final covariance estimator, thus we fixed just for simplicity and the stop criteria is set as
For the sample dimensions, we always take and .
4.1 Example I: Block Structure
Analogous to the model, modified slightly here, emerged in  who synthesized samples for a block diagonal population covariance matrix , we will use blocks of random sizes, and each block is generated by where the entries of
are drawn i.i.d. from the uniform distribution on. Evidently, the rank of produced in the way is . Corresponding MATLAB code of generating is , thereby deriving the sample covariance matrix