A classical and long-studied subject in signal processing is denoising. This task considers a given measurement signal obtained from a clear signal by an additive contamination of the form
. We shall restrict our discussion to zero mean i.i.d. Gaussian noise vectors
, with each entry drawn at random from the normal distribution. The denoising goal is to recover from .
An effective denoising algorithm assumes knowledge about the noise characteristics, like the above description, and introduces some assumptions about the class of signals to which belongs, that is, a-priori knowledge about the signal. There is a great number of algorithms today, corresponding to a variety of signal models. Among these, a recently emerging group of techniques relies on sparse and redundant representations for modeling the signals SIAM1 .
A signal is said to have a sparse representation over a known dictionary, , if there exists a sparse vector such that . The vector is the representation of , having a number of non-zeros, , which is much smaller than its length, . Thus, describes how to construct as a linear combination of a few columns (also referred to as atoms) of . In general, the dictionary may be redundant, containing more atoms than the signal dimension .
Assuming that with a sparse representation , how can one recover from the noisy measurement
? By posing a prior probability density function over, one can derive the exact Maximum-A’posteriori Probability (MAP) estimator for this task. This becomes a search for the support of the sparse representation
that maximizes the posterior probability. This problem is computationally complex, as it generally requires an exponential sweep over all the possible sparse supportsNPHARD . Therefore, approximation methods are often employed, such as the Orthogonal Matching Pursuit (OMP) OMP and the Basis Pursuit (BP) BP .
While MAP estimation promotes seeking a single sparse representation to explain the measurements, recent work has shown that better results222In the -error sense, which is often the measure used to assess performance. are possible using the Minimum Mean Square Error (MMSE) estimator MMSE0 ; MMSE1 ; MMSE2 . These works develop MMSE estimators, showing that they lead to a weighted average of all the possible representations that may explain the signal, with weights related to their probabilities. Just like MAP in the general setting, this estimation is infeasible to compute, and thus various approximations are proposed MMSE0 ; MMSE1 ; MMSE2 .
A well known and celebrated result in signal processing is the fact that the MAP estimator mentioned above admits a closed-form simple formula, in the special case where the dictionary is square and unitary Shrinkage1 ; Shrinkage2 ; Shrinkage3 . This formula, known as a shrinkage operation, yields the estimate by applying a simple 1D operation on the entries of the vector . The denoised signal is then obtained by multiplication by . Shrinkage tends to eliminate small entries, while leaving larger ones almost intact.
Our recent work reported in MMSE3 ; MMSE4 aimed to develop an MMSE closed-form formula for the unitary case. With a specific prior model on , a recursive formula for this task was developed. Thus, at least in principle, the implications from this work are that one need not turn to approximations, as this formula is easily computable, leading to the exact MMSE. While such a result is very encouraging, it does not provide a truly simple technique of the form that MAP enjoys. Furthermore, due to its recursive nature, this algorithm suffers from instability problems that hinder its use for high-dimensional signals.
In the present work, we propose a modified prior model for the sparse representation vector . We show that this change leads to a simplified MMSE formula, which, just as for the MAP, becomes a scalar shrinkage, albeit with a different response curve. As such, this exact MMSE denoising exhibits no numerical sensitivities as in MMSE3 ; MMSE4 , and thus it can operate easily in any dimension.
The core idea that MMSE estimation for the unitary case leads to a shrinkage algorithm has been observed before Shrinkage4 ; Shrinkage5 ; Shrinkage6 ; Shrinkage7 ; Shrinkage8 . Here we adopt a distinct approach in the derivation, which also gives us exact and simple expressions for MAP and MMSE shrinkage curves, and their expected -errors. We use these as a stepping-stone towards the development of upper bounds on the MAP and the MMSE estimation errors.
A fundamental and key question that has attracted attention in recent years is the proximity between practical pursuit333Pursuit is a generic name given to algorithms that aim to estimate . results and the oracle performance. The oracle is an estimator that knows the true support, thus giving an ultimate result which can be used as a gold-standard for assessing practical pursuit performance. For example, the work reported in Danzig shows that the Danzig Selector algorithm is a constant (and log) factor away from the oracle result. Similar claims for the BP, the OMP, and even the thresholding algorithms, are made in Zvika .
In both these papers, the analysis is deterministic and non-Bayesian, which is different from the point of view taken in this paper. In this work we tie the MAP and the MMSE errors for the unitary case to the error obtained by an oracle estimator. We establish worst-case gain-factors of the MAP and the MMSE errors relative to the oracle error. This gives a clear ranking of these algorithms, and states clearly their nearness to the ideal performance.
The paper is organized as follows. In section 2 we describe the signal model we shall use throughout this work. For completeness of the presentation, we also derive the MAP and MMSE estimators for the general case in this section. In Section 3 we turn to the unitary case and present the ideal MAP and MMSE estimators, showing how both lead to shrinkage operations. Section 4 is devoted to the development of the performance behavior of the MAP and MMSE estimates, and the upper bounds on their errors. Section 5 presents numerical experiments, demonstrating the proposed algorithms in action. In Section 6 we conclude the paper.
2.1 The Signal Model
We consider a generative signal model that resembles the one presented in MMSE1 . In this model, each atom has a prior probability of participating in the support of each signal, and of not appearing. One can think of the support selection stage as performing biased coin-tosses of coins, with the coin having a probability of “heads” and for “tails”. The coins that turn up (“heads”) constitute the support for this signal. Thus, the a priori probability for any support is given by
It is important to note that, as opposed to the model used in MMSE2 ; MMSE3 ; MMSE4 , here it is not possible to explicitly prescribe the cardinality of the support, nor is it possible to limit it (as even the empty and full supports may arise by chance). If, for some , equals 0, all the supports that contain element have zero probability. Similarly, if we have , then all the supports that do not select the atom also have zero probability. Hence, in our study we only need to consider values for all , and this is assumed henceforth.
We further assume that, given the support , the coefficients in
on this support are drawn as i.i.d. Gaussian random variables444In fact, we may suggest a broader model of the form , for an arbitrary function , thus keeping the model very general. It appears that with this change one can still obtain MMSE-shrinkage. Furthermore, one may also study the sensitivity of MMSE/MAP shrinkage-curves under perturbations of , and even find the worst choice of this function, that leads to the maximal expected error in MMSE – all these are left to future work, as we mainly focus here on the Gaussian model.
with zero mean and variance,
is the identity matrix of size.
We measure the vector , a noisy linear combination of atoms from with coefficients , namely, , where the noise is assumed to be white Gaussian with variance , i.e., , and the columns of are normalized.
¿From the model assumptions made above, it can be seen KAY that and are jointly Gaussians for a given support,
and is comprised of the columns of the matrix that appear in the support . Hence, the marginal p.d.f. is Gaussian and it is given by
Using properties of the Multivariate Gaussian p.d.f. (see (KAY, , p. 325)), we have that the likelihood and the posterior p.d.f. are also Gaussian, namely
where the sub-vector is comprised of the elements of whose indices are in the support , and
There is a direct link between the matrices and , expressed using the matrix inversion lemma,
2.2 MAP/MMSE Estimators – The General Case
2.2.1 The Oracle Estimator
The first estimator we derive is the oracle. This estimator assumes knowledge of the chosen support for , information that is unknown in the actual problem. Therefore it cannot be obtained in practice. Nevertheless, it gives us a reference performance quality to compare against. The oracle can target the minimization of the MSE555Or MAP – in fact, the two are the same in this case due to the Gaussianity of .. A well-known and classical result states that the MMSE estimator is equal to the conditional mean of the unknown, conditioned on the known parts, and thus in our case it is . As the support is known, we need to estimate , the sub-vector of non-zero entries of , so the estimator is given by
where this equality comes from the expectation of the probability distribution in (7).
2.2.2 Maximum A-Posteriori Estimator (MAP)
The MAP estimator proposes an estimate that maximizes the posterior probability. As the model mixes discrete probabilities with continuous ones , the MAP should be carefully formulated, otherwise, the most probable estimate would be the zero vector. Thus, we choose instead to maximize the posterior of the support,
and only then compute the corresponding estimate . We know from Equation (7), that behaves as a normal distribution, and thus the estimate is given by the oracle in (10) with the specific support . Using Bayes’s rule, Equation (11) leads to
Since does not depend on , it affects this expression only as a normalizing factor. Using the expressions of the probabilities in the numerator that are given by Equations (5) and (1), respectively, we obtain
where we have introduced the notation for brevity of later expressions. Returning to our MAP goal posed in Equation (11), applying a few simple algebraic steps on the expression for leads to the following penalty function, which should be maximized with respect to the support ,
over all possible supports. Once found, we obtain the MAP estimation by using the oracle formula from Equation (10), which computes for this support.
2.2.3 Minimum Mean Square Error Estimator (MMSE)
The MMSE estimate is given by the conditional expectation, ,
Marginalizing the posterior probability over all possible supports , we have
Equation (17) shows that the MMSE estimator is a weighted average of all the “oracle” solutions, each with a different support and weighted by its probability. Finally, we substitute the expression developed in Equation (13) into Equation (17), and get the formula for MMSE estimation,
where is the overall normalizing factor.
2.3 Estimator Performance – The General Case
We conclude this background section by discussing the expected Mean-Squared-Error (MSE) induced by each of the estimators developed above. Our goal is to obtain clear expressions for these errors, which will later serve when we develop similar and simpler expressions for the unitary case.
We start with the performance of the oracle estimator, as the oracle is central to the derivation of MAP and MMSE errors. The oracle’s expected MSE is given by
where we have used Equation (8), and the fact that .
Our analysis continues with the expected error for a general estimate , observing that it can be written as
where we have used the marginalization proposed in Equation (16). We add and subtract the oracle estimate that corresponds to the support into the norm term, yielding
Note that the integral over the cross-term vanishes, since the term is deterministic and can thus be moved outside the integration, while the expression remaining inside the integral is zero, since the oracle estimate is the expected over this domain and with this support.
Continuing with Equation (2.3), the first term represents the MSE of an oracle for a given support , as derived in Equation (19). In the second term, the norm factor does not depend on the integral variable , and thus it may be pulled outside the integration. The remaining part is equal to one. Therefore,
By plugging into this expression, we get the MMSE error. Note that if we minimize the above with respect to , we get the MMSE estimate formula exactly, as expected, since the MMSE is the solution that leads to the smallest error.
Observe that (23) can be written differently by adding and subtracting inside the norm term, giving
In this derivation, the cross-term drops out, since in this summation the term can be positioned outside the summation, and then, using Equation (18), it is easily shown that we are left with an expression that equals . We have then a general error formula for any estimator, given by equation (24). In particular, this means that the error for the MAP estimate can be calculated by
3 MAP & MMSE Estimators for a Unitary Dictionary
The derivation of MAP and MMSE for a general dictionary leads to prohibitive computational tasks. As we shall see next, when using unitary dictionaries, we are able to avoid these demanding computations, and instead obtain closed-form solutions for each one of the estimators. Furthermore, the two resulting algorithms are very similar, both having a shrinkage structure.
While this claim about MAP and MMSE leading to shrinkage is not new Shrinkage4 ; Shrinkage5 ; Shrinkage6 ; Shrinkage7 ; Shrinkage8 , our distinct development of the closed-form shrinkage formulae will lead to a simple computational process for the evaluation of the MAP and the MMSE, which will facilitate the performance analysis derived in Section 4.
3.1 The Oracle
Just as for the general dictionary case, we start by deriving an expression for the oracle estimation. In this case, we assume that the dictionary
is a unitary matrix, and thus. Moreover, it is easily seen that , which will simplify our expressions. We start by simplifying the matrix defined in (8),
The oracle solution, as given in Equation (10), becomes
where we have defined the constant and the vector . The oracle estimator has thus been reduced to a simple matrix by vector multiplication.
3.2 The MAP – Unitary Case
We turn to the MAP estimation, which requires to first find the optimal support based on Equations (13) and (14), and then plug it into the oracle expression as given in Equation (10) to get the estimate.
We proceed by simplifying the expression in Equations (13) and (14). The matrix is defined in Equation (4) as . Denoting by a diagonal matrix with ones and zeros on its main diagonal matching the support666 is if , and elsewhere. , we obtain
Taking into account that , we can rewrite this expression as
where we have defined
We further define (which implies that ), and substitute this into Equation (30). Adding now the necessary normalization factor we get
The following observation will facilitate a further simplification of this expression:
Let be the set of all possible subsets of indices, and let be values associated with each index, such that . Then,
[Proof.] Consider the following experiment: a set of independent coins are tossed, with the coin having a probability for “heads” and for “tails”. The probability of a specific set of coins turning up “heads” (and the rest turning up “tails”) is . For any one toss of the coins, exactly one of these combinations will be the outcome. Therefore, the sum of these probabilities over all the combinations must be .
Using this proposition, the normalization term in Equation (32) vanishes, as it is equal to 1 ( since and for every ). We therefore obtain
The optimization task (11) can now be written as
Interpreting this expression, we see that every element in the support influences the penalty in one of two ways:
If it is part of the support: Multiply the expression by , or
If it is not in the support: Multiply the expression by .
As we aim to maximize the expression in Equation (35), the support will contain all the elements such that . (In the case that no such element exists, the support should be empty and the solution is therefore .) Once these elements are found, all we have to do is to multiply their value by and this is the MAP estimate.
Stated differently, this means that after computing the transformed vector , we test each of its entries, and set the MAP estimate for the entry to be
This is the shrinkage algorithm mentioned earlier – each entry is handled independently of the others, passing through a scalar shrinkage curve that nulls small entries and keeps large ones intact (up to the multiplication by ). There is no trace of the exhaustive and combinatorial search that characterizes MAP in the general case, and this simple algorithm yields the exact MAP estimation.
3.3 The MMSE – The Unitary Case
where is the vector in the canonical basis, and is an indicator function ( if , and zero otherwise). While this may seem like a cumbersome change, it will prove valuable in later derivations. Starting from Equation (18), substituting the expression developed for in Equation (34) into Equation (18), and using Equation (37), we obtain the following expression for the unitary MMSE estimator,
We introduce now another observation, similar to the one posed in Proposition 1. This will be used to further simplify the above expression.
Let be the set of all possible subsets of indices, and let be values associated with each index, such that . Then,
[Proof.] In the spirit of the coin tossing interpretation described in the proof of Proposition 1, the multiplication by the expression implies that only toss outcomes where the coin turns up “heads” are included in the summation. Thus, the overall probability of those is exactly the probability that the coin turn up “heads”, which is as claimed. A somewhat more formal way to pose this rationale is by observing that
The last summation is over the set , that contains all the supports in and do not contain the entry. Thus, for the remaining elements, this summation is complete, just as posed in Proposition 1, and therefore the overall expression equals .
Returning to the MMSE expression in Equation (38), and using this equality, we get a far simpler MMSE expression of the form
This is an explicit formula for MMSE estimation. The estimation is computed by first calculating , and then simply multiplying each entry by (which is a function of as well). Explicitly, the MMSE estimate is given elementwise by
This operation has the form of a scalar shrinkage operation, just like MAP. For this formula leads to , whereas for the outcome is (just like the MAP). Thus, the expression multiplying here serves as a soft-shrinkage777This should not be confused with the term soft-thresholding obtained when minimizing an penalty. operation, which replaces the hard-shrinkage practiced in the MAP. Figure 1 shows the various shrinkage functions obtained for each estimator.
4 Performance Analysis
4.1 Deriving the Estimators’ MSE
Our main goal in this work is to develop error expressions for the different estimators in the unitary regime, exploiting the general derivations of section 2.3. We start by calculating the error for an oracle solution . Using Equation (26) we obtain
where the indicator function is the same as previously used in (37). The last equality will become useful for our later development.
Turning to the MMSE estimator, recall the general expected-MSE expression in Equation (23),
This implies that the MMSE error can be alternatively written as
suggesting that the error is composed of an “oracle” error888See the similarity between the first term here and the one posed in Equation (42)., and an additional part that is necessarily positive (since ). As an extreme example, if the elements of the vector tend to be either very high or very low (compared to ), then the tend to the extremes as well. In such a case, the second term nearly vanishes, and the performance is close to that of the oracle.