I Introduction
Recent results in the theory of compressed sensing have generated immense interest in sparse vector estimation problems, resulting in a multitude of successful practical signal recovery algorithms. In several applications, such as the processing of natural images, audio, and speech, signals are not exactly sparse, but compressible, i.e., the magnitudes of the sorted coefficients of the vector follow a power law decay [1]. In [2] and [3]
, the authors show that random vectors drawn from a special class of probability distribution functions (pdf) known as
compressible priors result in compressible vectors. Assuming that the vector to be estimated (henceforth referred to as the unknown vector) has a compressible prior distribution enables one to formulate the compressible vector recovery problem in the Bayesian framework, thus allowing the use of Sparse Bayesian Learning (SBL) techniques [4]. In his seminal work, Tipping proposed an SBL algorithm for estimating the unknown vector, based on the Expectation Maximization (EM) and McKay updates [4]. Since these update rules are known to be slow, fast update techniques are proposed in [5]. A duality based algorithm for solving the SBL cost function is proposed in [6], and based reweighting schemes are explored in [7]. Such algorithms have been successfully employed for image/visual tracking [8], neuroimaging [9, 10], beamforming [11], and joint channel estimation and data detection for OFDM systems [12].Many of the aforementioned papers study the complexity, convergence and support recovery properties of SBL based estimators (e.g., [5, 6]). In [3], the general conditions required for the socalled instance optimality of such estimators are derived. However, it is not known whether these recovery algorithms are optimal in terms of the Mean Square Error (MSE) in the estimate or by how much their performance can be improved. In the context of estimating sparse signals, CramérRao lower bounds on the MSE performance are derived in [13, 14, 15]. However, to the best of our knowledge, none of the existing works provide a lower bound on the MSE performance of compressible vector estimation. Such bounds are necessary, as they provide absolute yardsticks for comparative analysis of estimators, and may also be used as a criterion for minimization of MSE in certain problems [16]. In this paper, we close this gap in theory by providing CramérRao type lower bounds on the MSE performance of estimators in the SBL framework.
As our starting point, we consider a linear Single Measurement Vector (SMV) SBL model given by
(1) 
where the observations and the measurement matrix are known, and is the unknown sparse/compressible vector to be estimated [17]. Each component of the additive noise
is white Gaussian, distributed as
, where the variance may be known or unknown. The SMVSBL system model in (1) can be generalized to a linear Multiple Measurement Vector (MMV) SBL model given by(2) 
Here, represents the observation vectors, the columns of are the sparse/compressible vectors with a common underlying distribution, and
each column of is modeled similar to in (1) [18].
In typical compressible vector estimation problems, is underdetermined (), rendering the problem illposed. Bayesian techniques circumvent this problem by
using a prior distribution on the compressible vector as a regularization, and computing the corresponding posterior estimate. To incorporate a compressible prior in
(1) and (2), SBL uses a twostage hierarchical model on the unknown vector, as shown in Fig. 1. Here, , where the diagonal matrix contains the hyperparameters as its diagonal elements. Further, an Inverse
Gamma (IG) hyperprior is assumed for itself, because it leads to a Student prior on the vector , which is known to be compressible
[4].^{1}^{1}1The IG hyperprior is conjugate to the Gaussian pdf [4]. In scenarios where the noise variance is unknown and random, an IG prior is used
for the distribution of the noise variance as well. For the system model in (2), every compressible vector , i.e., the
compressible vectors are governed by a common .
It is well known that the CramérRao Lower Bound (CRLB) provides a fundamental limit on the MSE performance of unbiased estimators
[19] for deterministic parameter estimation. For the estimation problem in SBL, an analogous bound known as the Bayesian CramérRao Bound (BCRB) is used to obtain lower bounds [20], by incorporating the prior distribution on the unknown vector. If the unknown vector consists of both deterministic and random components, Hybrid CramérRao Bounds (HCRB) are derived [21].In SBL, the unknown vector estimation problem can also be viewed as a problem involving nuisance parameters. Since the assumed hyperpriors are conjugate to the Gaussian likelihood, the marginalized distributions have a closed form and the Marginalized CramérRao Bounds (MCRB) [22] can be derived. For example, in the SBL hyperparameter estimation problem,
itself can be considered a nuisance variable and marginalized from the joint distribution,
, to obtain the log likelihood as(3) 
where [23].
The goal of this paper is to derive CramérRao type lower bounds on the MSE performance of estimators based on the SBL framework. Our contributions are as follows:

Under the assumption of known noise variance, we derive the HCRB and the BCRB for the unknown vector , as indicated in the left half of Fig. 2.

When the noise variance is known, we marginalize nuisance variables ( or ) and derive the corresponding MCRB, as indicated in the right half of Fig. 2. Since the MCRB is a function of the parameters of the hyperprior (and hence is an offline bound), it yields insights into the relationship between the MSE performance of the estimators and the compressibility of .

In the unknown noise variance case, we derive the BCRB, HCRB and MCRB for the unknown vector , as indicated in Fig. 3.

We derive the MCRB for a general parametric form of the compressible prior [3] and deduce lower bounds for two of the wellknown compressible priors, namely, the Student and generalized double Pareto distributions.

Similar to the SMVSBL case, we derive the BCRB, HCRB and MCRB for the MMVSBL model in (2).
Through numerical simulations, we show that the MCRB on the compressible vector is the tightest lower bound, and that the MSE performance of the EM algorithm achieves this bound at high SNR and as . The techniques used to derive the bounds can be extended to handle different compressible prior pdfs used in literature [2]. These results provide a convenient and easytocompute benchmark for comparing the performance of the existing estimators, and in some cases, for establishing their optimality in terms of the MSE performance.
The rest of this paper is organized as follows. In Sec. II, we provide the basic definitions and describe the problem set up. In Secs. III and IV, we derive the lower bounds for the cases shown in Figs. 2 and 3, respectively. The bounds are extended to the MMVSBL signal model in Sec. V. The efficacy of the lower bounds is graphically illustrated through simulation results in Sec. VI. We provide some concluding remarks in Sec. VII. In the Appendix, we provide proofs for the Propositions and Theorems stated in the paper.
Notation: In the sequel, boldface small letters denote vectors and boldface capital letters denote matrices. The symbols and denote the transpose and determinant of a matrix, respectively. The empty set is represented by , and denotes the Gamma function. The function
represents the pdf of the random variable
evaluated at its realization . Also, stands for a diagonal matrix with entries on the diagonal given by the vector . The symbol is the gradient with respect to (w.r.t.) the vector . The expectation w.r.t. a random variable is denoted as . Also, denotes that is positive semidefinite, and is the Kronecker product of the two matrices and .Ii Preliminaries
As a precursor to the sections that follow, we define the MSE matrix and the Fisher Information Matrix (FIM) [19], and state the assumptions under which we derive the lower bounds in this paper. Consider a general estimation problem where the unknown vector can be split into subvectors , where consists of random parameters distributed according to a known pdf, and consists of deterministic parameters. Let denote the estimator of as a function of the observations . The MSE matrix is defined as
(4) 
where denotes the random parameters to be estimated, whose realization is given by . The first step in obtaining CramérRao type lower bounds is to derive the FIM [19]. Typically, is expressed in terms of the individual blocks of submatrices, where the block is given by
(5) 
In this paper, we use the notation to represent the FIM under the different modeling assumptions. For example, when and , represents a Hybrid Information Matrix (HIM). When and , represents a Bayesian Information matrix (BIM). Assuming that the MSE matrix exists and the FIM is nonsingular, a lower bound on the MSE matrix is given by the inverse of the FIM:
(6) 
It is easy to verify that the underlying pdfs considered in the SBL model satisfy the regularity conditions required for computing the FIM (see Sec. 5.2.3 in [22]).
We conclude this section by making one useful observation about the FIM in the SBL problem. An assumption in the SMVSBL framework is that and are
independent of each other (for the MMVSBL model, and are independent). This assumption is reflected in the graphical model in Fig. 1,
where the compressible vector (and its attribute ) and the noise component (and its attribute ) are on unconnected branches. Due to this, a
submatrix of the FIM is of the form
(7) 
where there are no terms in which both and are jointly present. Hence, the corresponding terms in the above mentioned submatrix are always zero. This is formally stated in the following Lemma.
Lemma 1
When and , the block matrix of the FIM given by (5) simplifies to , i.e., to an all zero vector.
Iii SMVSBL: Lower Bounds when is Known
In this section, we derive lower bounds for the system model in (1) for the scenarios in Fig. 2, where the unknown vector is . We examine different modeling assumptions on and derive the corresponding lower bounds.
Iiia Bounds from the Joint pdf
IiiA1 HCRB for
In this subsection, we consider the unknown variables as a hybrid of a deterministic vector and a random vector distributed according to a Gaussian distribution parameterized by . Using the assumptions and notation in the previous section, we obtain the following proposition.
Proposition 1
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector with the parameterized distribution of the compressible signal given by , and with modeled as unknown and deterministic, is given by , where
(8) 
Proof: See Appendix A.
Note that the lower bound on the estimate of depends on the prior information through the diagonal matrix . In the SBL problem, the realization of the random parameter has to be used to compute the bound above, and hence, it is referred to as an online bound. Also, the lower bound on the MSE matrix of is , which is the same as the lower bound on the error covariance of the Baye’s vector estimator for a linear model (see Theorems 10.2 and 10.3 in [19]), and is achievable by the MMSE estimator when is known.
IiiA2 BCRB for
For deriving the BCRB, a hyperprior distribution is considered on , and the resulting is viewed as being drawn from a compressible prior distribution. The most commonly used hyperprior distribution in the literature is the IG distribution [4], where are distributed as , given by
(9) 
where . Using the definitions and notation in the previous section, we state the following proposition.
Proposition 2
For the signal model in (1), the BCRB on the MSE matrix of the unknown random vector , where the conditional distribution of the compressible signal is , and the hyperprior distribution on is , is given by , where
(10) 
Proof: See Appendix B.
It can be seen from that the lower bound on the MSE of is a function of the parameters of the IG prior on , i.e., a function of and , and it can be computed without the knowledge of realization of . Thus, it is an offline bound.
IiiB Bounds from Marginalized Distributions
IiiB1 MCRB for
Here, we derive the MCRB for , where is an unknown deterministic parameter. This requires the marginalized distribution , which is obtained by considering as a nuisance variable and marginalizing it out of the joint distribution , to obtain (3). Since is a deterministic parameter, the pdf must satisfy the regularity condition in [19]. We have the following theorem.
Theorem 1
Proof: See Appendix C.
To intuitively understand (11), we consider a special case of , and use the Woodbury formula to simplify , to obtain the entry of the matrix as
(12) 
Hence, the error in is bounded as . As , the bound reduces to , which is the same as the lower bound on the estimate of obtained as the lowerright submatrix in (8). For finite , the MCRB is tighter than the HCRB.
IiiB2 MCRB for
In this subsection, we assume a hyperprior on , which leads to a joint distribution of and , from which can be marginalized. Further, assuming specific forms for the hyperprior distribution can lead to a compressible prior on . For example, assuming an IG hyperprior on leads to an with a Student distribution. Sampling from a Student distribution with parameters and results in a compressible [2]. The Student prior is given by
(13) 
where ,
represents the number of degrees of freedom and
represents the inverse variance of the distribution. Using the notation developed so far, we state the following theorem.Theorem 2
Proof: See Appendix D.
We see that the bound derived depends on the parameters of the Student pdf. From [3], the prior is “somewhat” compressible for , and (14) is nonnegative and bounded for , i.e., the bound is meaningful in the range of used in practice. Note that, by choosing to be large (or the variance of to be small), the bound is dominated by the prior information, rather than the information from the observations, as expected in Bayesian bounds [19].
It is conjectured in [22] that, in general, the MCRB is tighter than the BCRB. Analytically comparing the MCRB (14) with the BCRB (8), we see that for the SBL problem of estimating a compressible vector, the MCRB is indeed tighter than the BCRB, since
The techniques used to derive the bounds in this subsection can be applied to any family of compressible distributions. In [3], the authors propose a parametric form of the Generalized Compressible Prior (GCP) and prove that such a prior is compressible for certain values of . In the following subsection, we derive the MCRB for the GCP.
IiiC General Marginalized Bounds
In this subsection, we derive MCRBs for the parametric form of the GCP. The GCP encompasses the double Pareto shrinkage type prior [24] and the Student prior (13) as its special cases. We consider the GCP on as follows
(15) 
where , and the normalizing constant . When , (15) reduces to the Student prior in (13), and when , it reduces to a generalized double Pareto shrinkage prior [24, 25]. Also, the expression for the GCP in [3] can be obtained from (15) by setting , and defining . The following theorem provides the MCRB for the GCP.
Theorem 3
Proof:
See Appendix E.
It is straightforward to verify that for , (16) reduces to the MCRB derived in (14) for the Student distribution. For ,
the inverse of the MCRB can be reduced to
(17) 
In Fig. 4, we plot the expression in (16). We observe that, in general, the bounds predict an increase in MSE for higher values of . Also, for given value of , the lower bounds at different signal to noise ratios (SNRs) converge as the value of increases, indicating that increasing renders the bound insensitive to the SNR. The lower bounds also predict a smaller value of MSE for a lower value of .
Thus far, we have presented the lower bounds on the MSE in estimating the unknown parameters of the SBL problem when the noise variance is known. In the next section, we extend the results to the case of unknown noise variance.
Iv SMVSBL: Lower Bounds when is Unknown
Let us denote the unknown noise variance as . In the Bayesian formulation, the noise variance is associated with a prior, and since the IG prior is conjugate to the Gaussian likelihood , it is assumed that [4], i.e., is distributed as
(18) 
Under this assumption, one can marginalize the unknown noise variance and obtain the likelihood as
(19) 
which is a multivariate Student distribution. It turns out that the straightforward approach of using the above multivariate likelihood to directly compute lower bounds for the various cases given in the previous section is analytically intractable, and that the lower bounds cannot be computed in closed form. Hence, we compute lower bounds from the joint pdf, i.e., we derive the HCRB and BCRBs for the unknown vector with the MSE matrix defined by (4).^{2}^{2}2We use the subscript to indicate that the error matrices and bounds are obtained for the case of unknown noise variance. Using the assumptions and notation from the previous sections, we obtain the following proposition.
Proposition 3
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector , where , with the distribution of the compressible vector given by , where is modeled as a deterministic or as a random parameter distributed as , and is modeled as a deterministic parameter, is given by , where
(20) 
In the above expression, with a slight abuse of notation, is the FIM given by (8) when is unknown deterministic and by (10) when
is random.
Proof: See Appendix F.
The lower bound on the estimation of matches with known lower bounds on noise variance estimation (see Sec. 3.5 in [19]). One disadvantage of such a
bound on is that the knowledge of the noise variance is essential to compute the bound, and hence, it cannot be computed offline. Instead, assigning a hyperprior
to would result in a lower bound that only depends on the parameters of the hyperprior, which are assumed to be known, allowing the bound to be computed offline. We state the
following proposition in this context.
Proposition 4
For the signal model in (1), the HCRB on the MSE matrix of the unknown vector , where , with the distribution of the vector given by , where is modeled as a deterministic parameter or as a random parameter distributed as , and with the random parameter distributed as , is given by , where
(21) 
In (21), is the FIM given in (8) when is unknown deterministic and by (10) when is random.
Proof:
See Appendix G.
In SBL problems, a noninformative prior on is typically preferred, i.e., the distribution of the noise variance is modeled to be as flat as possible. In [4], it was observed that a noninformative prior is obtained when . However, as , the bound in (21) is indeterminate. In Sec. VI, we illustrate the performance of the lower bound in (21) for practical values of and .
Iva Marginalized Bounds
In this subsection, we obtain lower bounds on the MSE of the estimator , in the presence of nuisance variables in the joint distribution. To start with, we consider the marginalized distributions of and , i.e., where both, and are deterministic variables. Since the unknowns are deterministic, the regularity condition has to be satisfied for . We state the following theorem.
Theorem 4
Proof: See Appendix H.
Remark: From the graphical model in Fig. 1, it can be seen that the branches consisting of and are independent conditioned on . However, when is marginalized, the nodes and are connected, and hence, Lemma 1 is no longer valid. Due to this, the lower bound on depends on and vice versa, i.e., and depend on both and through .
Thus far, we have presented several bounds for the MSE performance of the estimators , and in the SMVSBL framework. In the next section, we derive CramérRao type lower bounds for the MMVSBL signal model.
V Lower Bounds for the MMVSBL
In this section, we provide CramérRao type lower bounds for the estimation of unknown parameters in the MMVSBL model given in (2). We consider the estimation of the compressible vector from the vector of observations , which contain the stacked columns of and , respectively. In the MMVSBL model, each column of is distributed as , for , and the likelihood is given by , where and . The modeling assumptions on and are the same as in the SMVSBL case, given by (9) and (18), respectively [18].
Using the notation developed in Sec. II, we derive the bounds for the MMV SBL case similar to the SMVSBL cases considered in Secs. III and IV. Since the derivation of these bounds follow along the same lines as in the previous sections, we simply state results in Table I.
Bound Derived  Expression 

HCRB on  , 
BCRB on  
MCRB on  , 
where  
HCRB on  
BCRB on  
HCRB on  
BCRB on  
MCRB on , 
We see that the lower bounds on and are reduced by a factor of compared to the SMV case, which is intuitively satisfying. It turns out that it is not possible to obtain the MCRB on in the MMVSBL setting, since closed form expressions for the FIM are not available.
In the next section, we consider two popular algorithms for SBL and graphically illustrate the utility of the lower bounds.
Vi Simulations and Discussion
The vector estimation problem in the SBL framework typically involves the joint estimation of the hyperparameter and the unknown compressible vector . Since the hyperparameter estimation problem cannot be solved in closed form, iterative estimators are employed [4]. In this section, we consider the iterative updates based on the EM algorithm first proposed in [4]. We also consider the algorithm proposed in [6] based on the Automatic Relevance Determination (ARD) framework. We plot the MSE performance in estimating , and with the linear model in (1) and (2), for the EM algorithm, labeled EM, and the ARD based Reweighted algorithm, labeled ARDSBL. We compare the performance of the estimators against the derived lower bounds.
We simulate the lower bounds for a random underdetermined () measurement matrix , whose entries are i.i.d. and standard Bernoulli distributed. A compressible signal of dimension is generated by sampling from a Student distribution with the value of ranging from to , which is the range in which the signal is “somewhat” compressible, for high dimensional signals [3]. Figure 5 shows the decay profile of the sorted magnitudes of i.i.d. samples drawn from a Student distribution for different and with the value of fixed at .
Via Lower Bounds on the MSE Performance of
In this subsection, we compare the MSE performance of the ARDSBL estimator and the EM based estimator . Figure 6 depicts the MSE performance of for different SNRs and and , with . We compare it with the HCRB/BCRB derived in (8), which is obtained by assuming the knowledge of the realization of the hyperparameters . We see that the MCRB derived in (14) is a tight lower bound on the MSE performance at high SNR and .
Figure 7 shows the comparative MSE performance of the ARDSBL estimator and EM based estimator as a function of varying degrees of freedom , at an SNR of dB and and . As expected, the MSE performance of the algorithms is better at low values of since the signal is more compressible, and the MCRB and BCRB also reflect this behavior. The MCRB is a tight lower bound, especially for high values of . Figure 8 shows the MSE performance of the ARDSBL estimator and EM based estimator as a function of , at an SNR of dB and for two different values of . The MSE performance of the EM algorithm converges to that of the MCRB at higher .
ViB Lower Bounds on the MSE Performance of
In this subsection, we compare the different lower bounds for the MSE of the estimator for the SMV and MMVSBL system model. Figure 9 shows the MSE performance of as a function of SNR and , when is a random parameter, and . In this case, it turns out that there is a large gap between the performance of the EM based estimate and the lower bound.
When is deterministic, we first note that the EM based ML estimator for is asymptotically optimal and the lower bounds are practical for large data samples [19]. The results are listed in Table II. We see that for and , the MCRB and BCRB are tight lower bounds, with MCRB being marginally tighter than the BCRB. However, as increases, the gap between the MSE and the lower bounds increases.
SNR(dB)  

MSE  
MCRB  
BCRB  
MSE  
MCRB  
BCRB 
ViC Lower Bounds on the MSE Performance of
In Fig. 10, we compare the lower bounds on the MSE of the estimator in the SMV and MMVSBL settings, for different values of and . Here, is sampled from the IG pdf (18), with parameters and .
When is deterministic, the EM based ML estimator for is asymptotically optimal and the lower bounds are practical for large data samples [19]. Table III lists the MSE values of , the corresponding HCRB and MCRB for deterministic but unknown noise variance, while the true noise variance is fixed at . We see that for and , the MCRB is marginally tighter than the HCRB. However, when the noise variance is random, we see from Fig. 10 that there is a large gap between the MSE performance and the HCRB.
MSE  
MCRB  
HCRB  
MSE  
MCRB  
HCRB 
Vii Conclusion
In this work, we derived CramérRao type lower bounds on the MSE, namely, the HCRB, BCRB and MCRB, for the SMVSBL and the MMVSBL problem of estimating compressible signals. We used a hierarchical model for the compressible priors to obtain the bounds under various assumptions on the unknown parameters. The bounds derived by assuming a hyperprior distribution on the hyperparameters themselves provided key insights into the MSE performance of SBL and the values of the parameters that govern these hyperpriors. We derived the MCRB for the generalized compressible prior distribution, of which the Student and Generalized Pareto prior distribution are special cases. We showed that the MCRB is tighter than the BCRB. We compared the lower bounds with the MSE performance of the ARDSBL and the EM algorithm using Monte Carlo simulations. The numerical results illustrated the nearoptimality of EM based updates for SBL, which makes it attractive for practical implementations.
Comments
There are no comments yet.