Recent results in the theory of compressed sensing have generated immense interest in sparse vector estimation problems, resulting in a multitude of successful practical signal recovery algorithms. In several applications, such as the processing of natural images, audio, and speech, signals are not exactly sparse, but compressible, i.e., the magnitudes of the sorted coefficients of the vector follow a power law decay . In  and 
, the authors show that random vectors drawn from a special class of probability distribution functions (pdf) known ascompressible priors result in compressible vectors. Assuming that the vector to be estimated (henceforth referred to as the unknown vector) has a compressible prior distribution enables one to formulate the compressible vector recovery problem in the Bayesian framework, thus allowing the use of Sparse Bayesian Learning (SBL) techniques . In his seminal work, Tipping proposed an SBL algorithm for estimating the unknown vector, based on the Expectation Maximization (EM) and McKay updates . Since these update rules are known to be slow, fast update techniques are proposed in . A duality based algorithm for solving the SBL cost function is proposed in , and based reweighting schemes are explored in . Such algorithms have been successfully employed for image/visual tracking , neuro-imaging [9, 10], beamforming , and joint channel estimation and data detection for OFDM systems .
Many of the aforementioned papers study the complexity, convergence and support recovery properties of SBL based estimators (e.g., [5, 6]). In , the general conditions required for the so-called instance optimality of such estimators are derived. However, it is not known whether these recovery algorithms are optimal in terms of the Mean Square Error (MSE) in the estimate or by how much their performance can be improved. In the context of estimating sparse signals, Cramér-Rao lower bounds on the MSE performance are derived in [13, 14, 15]. However, to the best of our knowledge, none of the existing works provide a lower bound on the MSE performance of compressible vector estimation. Such bounds are necessary, as they provide absolute yardsticks for comparative analysis of estimators, and may also be used as a criterion for minimization of MSE in certain problems . In this paper, we close this gap in theory by providing Cramér-Rao type lower bounds on the MSE performance of estimators in the SBL framework.
As our starting point, we consider a linear Single Measurement Vector (SMV) SBL model given by
where the observations and the measurement matrix are known, and is the unknown sparse/compressible vector to be estimated . Each component of the additive noise
is white Gaussian, distributed as, where the variance may be known or unknown. The SMV-SBL system model in (1) can be generalized to a linear Multiple Measurement Vector (MMV) SBL model given by
Here, represents the observation vectors, the columns of are the sparse/compressible vectors with a common underlying distribution, and
each column of is modeled similar to in (1) .
In typical compressible vector estimation problems, is underdetermined (), rendering the problem ill-posed. Bayesian techniques circumvent this problem by using a prior distribution on the compressible vector as a regularization, and computing the corresponding posterior estimate. To incorporate a compressible prior in (1) and (2), SBL uses a two-stage hierarchical model on the unknown vector, as shown in Fig. 1. Here, , where the diagonal matrix contains the hyperparameters as its diagonal elements. Further, an Inverse Gamma (IG) hyperprior is assumed for itself, because it leads to a Student- prior on the vector , which is known to be compressible .111The IG hyperprior is conjugate to the Gaussian pdf . In scenarios where the noise variance is unknown and random, an IG prior is used for the distribution of the noise variance as well. For the system model in (2), every compressible vector , i.e., the compressible vectors are governed by a common .
It is well known that the Cramér-Rao Lower Bound (CRLB) provides a fundamental limit on the MSE performance of unbiased estimators for deterministic parameter estimation. For the estimation problem in SBL, an analogous bound known as the Bayesian Cramér-Rao Bound (BCRB) is used to obtain lower bounds , by incorporating the prior distribution on the unknown vector. If the unknown vector consists of both deterministic and random components, Hybrid Cramér-Rao Bounds (HCRB) are derived .
In SBL, the unknown vector estimation problem can also be viewed as a problem involving nuisance parameters. Since the assumed hyperpriors are conjugate to the Gaussian likelihood, the marginalized distributions have a closed form and the Marginalized Cramér-Rao Bounds (MCRB)  can be derived. For example, in the SBL hyperparameter estimation problem,
itself can be considered a nuisance variable and marginalized from the joint distribution,, to obtain the log likelihood as
The goal of this paper is to derive Cramér-Rao type lower bounds on the MSE performance of estimators based on the SBL framework. Our contributions are as follows:
Under the assumption of known noise variance, we derive the HCRB and the BCRB for the unknown vector , as indicated in the left half of Fig. 2.
When the noise variance is known, we marginalize nuisance variables ( or ) and derive the corresponding MCRB, as indicated in the right half of Fig. 2. Since the MCRB is a function of the parameters of the hyperprior (and hence is an offline bound), it yields insights into the relationship between the MSE performance of the estimators and the compressibility of .
In the unknown noise variance case, we derive the BCRB, HCRB and MCRB for the unknown vector , as indicated in Fig. 3.
We derive the MCRB for a general parametric form of the compressible prior  and deduce lower bounds for two of the well-known compressible priors, namely, the Student- and generalized double Pareto distributions.
Similar to the SMV-SBL case, we derive the BCRB, HCRB and MCRB for the MMV-SBL model in (2).
Through numerical simulations, we show that the MCRB on the compressible vector is the tightest lower bound, and that the MSE performance of the EM algorithm achieves this bound at high SNR and as . The techniques used to derive the bounds can be extended to handle different compressible prior pdfs used in literature . These results provide a convenient and easy-to-compute benchmark for comparing the performance of the existing estimators, and in some cases, for establishing their optimality in terms of the MSE performance.
The rest of this paper is organized as follows. In Sec. II, we provide the basic definitions and describe the problem set up. In Secs. III and IV, we derive the lower bounds for the cases shown in Figs. 2 and 3, respectively. The bounds are extended to the MMV-SBL signal model in Sec. V. The efficacy of the lower bounds is graphically illustrated through simulation results in Sec. VI. We provide some concluding remarks in Sec. VII. In the Appendix, we provide proofs for the Propositions and Theorems stated in the paper.
Notation: In the sequel, boldface small letters denote vectors and boldface capital letters denote matrices. The symbols and denote the transpose and determinant of a matrix, respectively. The empty set is represented by , and denotes the Gamma function. The function
represents the pdf of the random variableevaluated at its realization . Also, stands for a diagonal matrix with entries on the diagonal given by the vector . The symbol is the gradient with respect to (w.r.t.) the vector . The expectation w.r.t. a random variable is denoted as . Also, denotes that is positive semidefinite, and is the Kronecker product of the two matrices and .
As a precursor to the sections that follow, we define the MSE matrix and the Fisher Information Matrix (FIM) , and state the assumptions under which we derive the lower bounds in this paper. Consider a general estimation problem where the unknown vector can be split into sub-vectors , where consists of random parameters distributed according to a known pdf, and consists of deterministic parameters. Let denote the estimator of as a function of the observations . The MSE matrix is defined as
where denotes the random parameters to be estimated, whose realization is given by . The first step in obtaining Cramér-Rao type lower bounds is to derive the FIM . Typically, is expressed in terms of the individual blocks of submatrices, where the block is given by
In this paper, we use the notation to represent the FIM under the different modeling assumptions. For example, when and , represents a Hybrid Information Matrix (HIM). When and , represents a Bayesian Information matrix (BIM). Assuming that the MSE matrix exists and the FIM is non-singular, a lower bound on the MSE matrix is given by the inverse of the FIM:
It is easy to verify that the underlying pdfs considered in the SBL model satisfy the regularity conditions required for computing the FIM (see Sec. 5.2.3 in ).
We conclude this section by making one useful observation about the FIM in the SBL problem. An assumption in the SMV-SBL framework is that and are independent of each other (for the MMV-SBL model, and are independent). This assumption is reflected in the graphical model in Fig. 1, where the compressible vector (and its attribute ) and the noise component (and its attribute ) are on unconnected branches. Due to this, a submatrix of the FIM is of the form
where there are no terms in which both and are jointly present. Hence, the corresponding terms in the above mentioned submatrix are always zero. This is formally stated in the following Lemma.
When and , the block matrix of the FIM given by (5) simplifies to , i.e., to an all zero vector.
Iii SMV-SBL: Lower Bounds when is Known
In this section, we derive lower bounds for the system model in (1) for the scenarios in Fig. 2, where the unknown vector is . We examine different modeling assumptions on and derive the corresponding lower bounds.
Iii-a Bounds from the Joint pdf
Iii-A1 HCRB for
In this subsection, we consider the unknown variables as a hybrid of a deterministic vector and a random vector distributed according to a Gaussian distribution parameterized by . Using the assumptions and notation in the previous section, we obtain the following proposition.
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector with the parameterized distribution of the compressible signal given by , and with modeled as unknown and deterministic, is given by , where
Proof: See Appendix -A.
Note that the lower bound on the estimate of depends on the prior information through the diagonal matrix . In the SBL problem, the realization of the random parameter has to be used to compute the bound above, and hence, it is referred to as an online bound. Also, the lower bound on the MSE matrix of is , which is the same as the lower bound on the error covariance of the Baye’s vector estimator for a linear model (see Theorems 10.2 and 10.3 in ), and is achievable by the MMSE estimator when is known.
Iii-A2 BCRB for
For deriving the BCRB, a hyperprior distribution is considered on , and the resulting is viewed as being drawn from a compressible prior distribution. The most commonly used hyperprior distribution in the literature is the IG distribution , where are distributed as , given by
where . Using the definitions and notation in the previous section, we state the following proposition.
For the signal model in (1), the BCRB on the MSE matrix of the unknown random vector , where the conditional distribution of the compressible signal is , and the hyperprior distribution on is , is given by , where
Proof: See Appendix -B.
It can be seen from that the lower bound on the MSE of is a function of the parameters of the IG prior on , i.e., a function of and , and it can be computed without the knowledge of realization of . Thus, it is an offline bound.
Iii-B Bounds from Marginalized Distributions
Iii-B1 MCRB for
Here, we derive the MCRB for , where is an unknown deterministic parameter. This requires the marginalized distribution , which is obtained by considering as a nuisance variable and marginalizing it out of the joint distribution , to obtain (3). Since is a deterministic parameter, the pdf must satisfy the regularity condition in . We have the following theorem.
Proof: See Appendix -C.
To intuitively understand (11), we consider a special case of , and use the Woodbury formula to simplify , to obtain the entry of the matrix as
Hence, the error in is bounded as . As , the bound reduces to , which is the same as the lower bound on the estimate of obtained as the lower-right submatrix in (8). For finite , the MCRB is tighter than the HCRB.
Iii-B2 MCRB for
In this subsection, we assume a hyperprior on , which leads to a joint distribution of and , from which can be marginalized. Further, assuming specific forms for the hyperprior distribution can lead to a compressible prior on . For example, assuming an IG hyperprior on leads to an with a Student- distribution. Sampling from a Student- distribution with parameters and results in a -compressible . The Student- prior is given by
represents the number of degrees of freedom andrepresents the inverse variance of the distribution. Using the notation developed so far, we state the following theorem.
Proof: See Appendix -D.
We see that the bound derived depends on the parameters of the Student- pdf. From , the prior is “somewhat” compressible for , and (14) is nonnegative and bounded for , i.e., the bound is meaningful in the range of used in practice. Note that, by choosing to be large (or the variance of to be small), the bound is dominated by the prior information, rather than the information from the observations, as expected in Bayesian bounds .
It is conjectured in  that, in general, the MCRB is tighter than the BCRB. Analytically comparing the MCRB (14) with the BCRB (8), we see that for the SBL problem of estimating a compressible vector, the MCRB is indeed tighter than the BCRB, since
The techniques used to derive the bounds in this subsection can be applied to any family of compressible distributions. In , the authors propose a parametric form of the Generalized Compressible Prior (GCP) and prove that such a prior is compressible for certain values of . In the following subsection, we derive the MCRB for the GCP.
Iii-C General Marginalized Bounds
In this subsection, we derive MCRBs for the parametric form of the GCP. The GCP encompasses the double Pareto shrinkage type prior  and the Student- prior (13) as its special cases. We consider the GCP on as follows
where , and the normalizing constant . When , (15) reduces to the Student- prior in (13), and when , it reduces to a generalized double Pareto shrinkage prior [24, 25]. Also, the expression for the GCP in  can be obtained from (15) by setting , and defining . The following theorem provides the MCRB for the GCP.
In Fig. 4, we plot the expression in (16). We observe that, in general, the bounds predict an increase in MSE for higher values of . Also, for given value of , the lower bounds at different signal to noise ratios (SNRs) converge as the value of increases, indicating that increasing renders the bound insensitive to the SNR. The lower bounds also predict a smaller value of MSE for a lower value of .
Thus far, we have presented the lower bounds on the MSE in estimating the unknown parameters of the SBL problem when the noise variance is known. In the next section, we extend the results to the case of unknown noise variance.
Iv SMV-SBL: Lower Bounds when is Unknown
Let us denote the unknown noise variance as . In the Bayesian formulation, the noise variance is associated with a prior, and since the IG prior is conjugate to the Gaussian likelihood , it is assumed that , i.e., is distributed as
Under this assumption, one can marginalize the unknown noise variance and obtain the likelihood as
which is a multivariate Student- distribution. It turns out that the straightforward approach of using the above multivariate likelihood to directly compute lower bounds for the various cases given in the previous section is analytically intractable, and that the lower bounds cannot be computed in closed form. Hence, we compute lower bounds from the joint pdf, i.e., we derive the HCRB and BCRBs for the unknown vector with the MSE matrix defined by (4).222We use the subscript to indicate that the error matrices and bounds are obtained for the case of unknown noise variance. Using the assumptions and notation from the previous sections, we obtain the following proposition.
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector , where , with the distribution of the compressible vector given by , where is modeled as a deterministic or as a random parameter distributed as , and is modeled as a deterministic parameter, is given by , where
In the above expression, with a slight abuse of notation, is the FIM given by (8) when is unknown deterministic and by (10) when
Proof: See Appendix -F.
The lower bound on the estimation of matches with known lower bounds on noise variance estimation (see Sec. 3.5 in ). One disadvantage of such a bound on is that the knowledge of the noise variance is essential to compute the bound, and hence, it cannot be computed offline. Instead, assigning a hyperprior to would result in a lower bound that only depends on the parameters of the hyperprior, which are assumed to be known, allowing the bound to be computed offline. We state the following proposition in this context.
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector , where , with the distribution of the vector given by , where is modeled as a deterministic parameter or as a random parameter distributed as , and with the random parameter distributed as , is given by , where
In SBL problems, a non-informative prior on is typically preferred, i.e., the distribution of the noise variance is modeled to be as flat as possible. In , it was observed that a non-informative prior is obtained when . However, as , the bound in (21) is indeterminate. In Sec. VI, we illustrate the performance of the lower bound in (21) for practical values of and .
Iv-a Marginalized Bounds
In this subsection, we obtain lower bounds on the MSE of the estimator , in the presence of nuisance variables in the joint distribution. To start with, we consider the marginalized distributions of and , i.e., where both, and are deterministic variables. Since the unknowns are deterministic, the regularity condition has to be satisfied for . We state the following theorem.
Proof: See Appendix -H.
Remark: From the graphical model in Fig. 1, it can be seen that the branches consisting of and are independent conditioned on . However, when is marginalized, the nodes and are connected, and hence, Lemma 1 is no longer valid. Due to this, the lower bound on depends on and vice versa, i.e., and depend on both and through .
Thus far, we have presented several bounds for the MSE performance of the estimators , and in the SMV-SBL framework. In the next section, we derive Cramér-Rao type lower bounds for the MMV-SBL signal model.
V Lower Bounds for the MMV-SBL
In this section, we provide Cramér-Rao type lower bounds for the estimation of unknown parameters in the MMV-SBL model given in (2). We consider the estimation of the compressible vector from the vector of observations , which contain the stacked columns of and , respectively. In the MMV-SBL model, each column of is distributed as , for , and the likelihood is given by , where and . The modeling assumptions on and are the same as in the SMV-SBL case, given by (9) and (18), respectively .
Using the notation developed in Sec. II, we derive the bounds for the MMV SBL case similar to the SMV-SBL cases considered in Secs. III and IV. Since the derivation of these bounds follow along the same lines as in the previous sections, we simply state results in Table I.
|MCRB on ,|
We see that the lower bounds on and are reduced by a factor of compared to the SMV case, which is intuitively satisfying. It turns out that it is not possible to obtain the MCRB on in the MMV-SBL setting, since closed form expressions for the FIM are not available.
In the next section, we consider two popular algorithms for SBL and graphically illustrate the utility of the lower bounds.
Vi Simulations and Discussion
The vector estimation problem in the SBL framework typically involves the joint estimation of the hyperparameter and the unknown compressible vector . Since the hyperparameter estimation problem cannot be solved in closed form, iterative estimators are employed . In this section, we consider the iterative updates based on the EM algorithm first proposed in . We also consider the algorithm proposed in  based on the Automatic Relevance Determination (ARD) framework. We plot the MSE performance in estimating , and with the linear model in (1) and (2), for the EM algorithm, labeled EM, and the ARD based Reweighted algorithm, labeled ARD-SBL. We compare the performance of the estimators against the derived lower bounds.
We simulate the lower bounds for a random underdetermined () measurement matrix , whose entries are i.i.d. and standard Bernoulli distributed. A compressible signal of dimension is generated by sampling from a Student- distribution with the value of ranging from to , which is the range in which the signal is “somewhat” compressible, for high dimensional signals . Figure 5 shows the decay profile of the sorted magnitudes of i.i.d. samples drawn from a Student- distribution for different and with the value of fixed at .
Vi-a Lower Bounds on the MSE Performance of
In this subsection, we compare the MSE performance of the ARD-SBL estimator and the EM based estimator . Figure 6 depicts the MSE performance of for different SNRs and and , with . We compare it with the HCRB/BCRB derived in (8), which is obtained by assuming the knowledge of the realization of the hyperparameters . We see that the MCRB derived in (14) is a tight lower bound on the MSE performance at high SNR and .
Figure 7 shows the comparative MSE performance of the ARD-SBL estimator and EM based estimator as a function of varying degrees of freedom , at an SNR of dB and and . As expected, the MSE performance of the algorithms is better at low values of since the signal is more compressible, and the MCRB and BCRB also reflect this behavior. The MCRB is a tight lower bound, especially for high values of . Figure 8 shows the MSE performance of the ARD-SBL estimator and EM based estimator as a function of , at an SNR of dB and for two different values of . The MSE performance of the EM algorithm converges to that of the MCRB at higher .
Vi-B Lower Bounds on the MSE Performance of
In this subsection, we compare the different lower bounds for the MSE of the estimator for the SMV and MMV-SBL system model. Figure 9 shows the MSE performance of as a function of SNR and , when is a random parameter, and . In this case, it turns out that there is a large gap between the performance of the EM based estimate and the lower bound.
When is deterministic, we first note that the EM based ML estimator for is asymptotically optimal and the lower bounds are practical for large data samples . The results are listed in Table II. We see that for and , the MCRB and BCRB are tight lower bounds, with MCRB being marginally tighter than the BCRB. However, as increases, the gap between the MSE and the lower bounds increases.
Vi-C Lower Bounds on the MSE Performance of
When is deterministic, the EM based ML estimator for is asymptotically optimal and the lower bounds are practical for large data samples . Table III lists the MSE values of , the corresponding HCRB and MCRB for deterministic but unknown noise variance, while the true noise variance is fixed at . We see that for and , the MCRB is marginally tighter than the HCRB. However, when the noise variance is random, we see from Fig. 10 that there is a large gap between the MSE performance and the HCRB.
In this work, we derived Cramér-Rao type lower bounds on the MSE, namely, the HCRB, BCRB and MCRB, for the SMV-SBL and the MMV-SBL problem of estimating compressible signals. We used a hierarchical model for the compressible priors to obtain the bounds under various assumptions on the unknown parameters. The bounds derived by assuming a hyperprior distribution on the hyperparameters themselves provided key insights into the MSE performance of SBL and the values of the parameters that govern these hyperpriors. We derived the MCRB for the generalized compressible prior distribution, of which the Student- and Generalized Pareto prior distribution are special cases. We showed that the MCRB is tighter than the BCRB. We compared the lower bounds with the MSE performance of the ARD-SBL and the EM algorithm using Monte Carlo simulations. The numerical results illustrated the near-optimality of EM based updates for SBL, which makes it attractive for practical implementations.