I Introduction
Data representation is a crucial operation in many signal and image processing applications. These applications include signal and image reconstruction [1, 2] , restoration [3, 4] and compression [5, 6]
. In this respect, many linear transforms have been proposed in order to obtain suitable signal representations in other domains than the original spatial or temporal ones. The traditional Fourier and discrete cosine transforms provide a good frequency localization, but at the expense of a poor spatial or temporal localization. To improve localization both in the spatial/temporal and frequency domains, the wavelet transform (WT) was introduced as a powerful tool in the
’s [7]. Many waveletlike basis decompositions have been subsequently proposed offering different features. For instance, we can mention the wavelet packets [8] or the grouplet bases [9]. To further improve signal representations, redundant linear decomposition families called frames have become the focus of many works during the last decade. For the sake of clarity, it must be pointed out that the term frame [10] is understood in the sense of Hilbert space theory and not in the sense of some recent works like [11].The main advantage of frames lies in their flexibility to capture local features of the signal. Hence, they may result in sparser representations as shown in the literature on curvelets [10], bandelets [12] or dualtrees [13] in image processing. However, a major difficulty when using frame representations in a statistical framework is to estimate the parameters of the frame coefficient probability distribution. Actually, since frame synthesis operators are generally not injective, even if the signal is perfectly known, the determination of its frame coefficients is an underdetermined problem.
This paper studies a hierarchical Bayesian approach to estimate the frame coefficients and their hyperparameters. Although this approach is conceptually able to deal with any desirable distribution for the frame coefficients, we focus in this paper on generalized Gaussian (GG) priors. Note however that we do not restrict our attention to logconcave GG prior probability density functions (pdf), which may be limited for providing accurate models of sparse signals
[14]. In addition, the proposed method can be applied to noisy data when imprecise measurements of the signal are only available. Our work takes advantage of the current developments in Markov Chain Monte Carlo (MCMC) algorithms [15, 16, 17] that have already been investigated for instance in image separation [18], image restoration [19] and brain activity detection in functional MRI [20, 21]. These algorithms have also been investigated for signal/image processing problems with sparsity constraints. These constraints may be imposed in the original space like in [22], where a sparse image reconstruction problem is assessed in the image domain. They may also be imposed on some redundant representation of the signal like in [23], where a timeseries sparse coding problem is addressed.Hybrid MCMC algorithms
[24, 25] are designed combining MetropolisHastings
(MH) [26] and Gibbs [27] moves to sample
according to the posterior distribution of interest. MCMC algorithms
and WT have been jointly investigated in some works dealing with
signal denoising in a Bayesian framework
[28, 29, 30, 18].
However, in contrast with the present work where overcomplete frame representations are considered, these works are limited to wavelet
bases for which the hyperparameter estimation problem is much easier to handle.
This paper is organized as follows. Section II
presents a brief overview on the concepts of frame and frame
representation. The hierarchical Bayesian model proposed for frame
representation is introduced in Section III. Two algorithms for
sampling the posterior distribution are proposed in Section IV.
To illustrate the effectiveness of these algorithms, experiments on both
synthetic and real world data are presented in Section
V. In this section, applications to image recovery problems are also considered. Finally some conclusions are drawn in Section VI.
Ii Problem Formulation
Iia The frame concept
In the following, we will consider realvalued digital signals of length as elements of the Euclidean space endowed with the usual scalar product and norm denoted as and , respectively. Let be an integer greater than or equal to
. A family of vectors
in the finitedimensional space is a frame when there exists a constant in such that^{1}^{1}1The classical upper bound condition is always satisfied in finite dimension. In this case, the frame condition is also equivalent to saying that the frame operator has full rank .(1) 
If the inequality (1) becomes an equality, is called a tight frame. The bounded linear frame analysis operator and the adjoint synthesis frame operator are defined as
(2)  
(3)  
Note that is injective whereas is surjective. When , is an orthonormal basis. A simple example of a redundant frame is the union of orthonormal bases. In this case, the frame is tight with and thus, we have where is the identity operator.
IiB Frame representation
An observed signal can be written according to its frame representation (FR) involving coefficients as follows
(4) 
where is the error between the observed signal and its FR . This error is modeled by imposing that belongs to the closed convex set
(5) 
where is some error bound and can be any norm on .
In signal/image recovery problems, is nothing but an additive noise that corrupts the measured data. By adopting a probabilistic approach,
and are assumed to be realizations of random
vectors and . In this context, our goal is to
characterize the probability distribution of , by
considering some parametric probabilistic model and by estimating
the associated hyperparameters.
A useful example where this characterization may be of great
interest is framebased signal/image denoising in a Bayesian
framework. Actually, denoising in the wavelet domain using wavelet
frame decompositions has already been investigated since the seminal work [31] as this kind of representation
provides sparse description of regular signals. The related
hyperparameters have then to be estimated.
When is bijective and , this estimation can be performed by inverting the transform so as to deduce from and by resorting to standard estimation techniques on . However, as mentioned in Section IIA, for redundant frames, is not bijective, which makes the hyperparameter estimation problem more difficult since deducing from is no longer unique. This paper presents hierarchical Bayesian algorithms to address this issue.
Iii Hierarchical Bayesian Model
In a Bayesian framework, we first need to define prior distributions for the frame coefficients. For instance, this prior may be chosen so as to promote the sparsity of the representation. In the following, denotes the pdf of the frame coefficients that depends on an unknown hyperparameter vector and is the a priori pdf of the hyperparameter vector . In compliance with the observation model (4),
is a uniform distribution on the closed convex set
defined as(6) 
where . Denoting by
the random variable associated with the hyperparameter vector
and using the hierarchical structure between and , the conditional distribution of given can be written as(7) 
where means proportional to.
In this work, we assume that frame coefficients are a priori independent with marginal GG distributions. This assumption has been successfully used in many studies [32, 33, 34, 35, 36, 37] and leads to the following frame coefficient prior
(8) 
where (with ) are the scale and shape parameters associated with , which is the th component of the frame coefficient vector and is the Gamma function. Note that small values of the shape parameters are appropriate for modelling sparse signals. When , , a Laplace prior is obtained, which was shown to play a central role in sparse signal recovery [38] and compressed sensing [39].
By introducing , , the frame prior can be rewritten as^{2}^{2}2The interest of this new parameterization will be clarified in Section IV.
(9) 
The distribution of a frame coefficient generally differs from one coefficient to another. However, some frame coefficients can have very similar distributions (that can be defined by the same hyperparameters and ). As a consequence, we propose to split the frame coefficients into different groups. The th group will be parameterized by a unique hyperparameter vector denoted as (after the reparameterization mentioned above). In this case, the frame prior can be expressed as
(10) 
where the summation covers the index set of the elements of the th group containing elements and . Note that in our simulations, each group will correspond to a given wavelet subband. A coarser classification may be made when using multiscale frame representations by considering that all the frame coefficients at a given resolution level belong to a same group.
The hierarchical Bayesian model for the frame decomposition is completed by the following improper hyperprior
(11) 
where for a set ,
(12) 
The motivations for using this kind of prior are summarized below:

the interval covers all possible values of encountered in practical applications. Moreover, there is no additional information about the parameter .

The prior for the parameter is a Jeffrey’s distribution that reflects the absence of knowledge about this parameter [40]. This kind of prior is often used for scale parameters.
The resulting posterior distribution is therefore given by
(13) 
The Bayesian estimators (e.g., the maximum a posteriori (MAP) or minimum mean square error (MMSE) estimators) associated with the posterior distribution (13) have no simple closedform expression. The next section studies different sampling strategies for generating samples distributed according to the posterior distribution (13). The generated samples will be used to estimate the unknown model parameter and hyperparameter vectors and .
Iv Sampling strategies
This section proposes different MCMC methods to generate samples distributed according to the posterior defined in (13).
Iva Hybrid Gibbs Sampler
A very standard strategy to sample according to (7) is provided by the Gibbs sampler. The Gibbs sampler iteratively generates samples distributed according to conditional distributions associated with the target distribution. More precisely, the basic Gibbs sampler iteratively generates samples distributed according to and .
IvA1 Sampling the frame coefficients
Straightforward calculations yield the following conditional distribution
(14) 
where is defined in (5). This conditional distribution is a product of GG distributions truncated on . Actually, sampling according to this truncated distribution is not always easy to perform since the adjoint frame operator is usually of large dimension. However, two alternative sampling strategies are detailed in what follows.
Naive sampling
This sampling method proceeds by sampling according to independent GG distributions
(15) 
and then accepting the proposed candidate only if . This method can be used for any frame decomposition and any norm. However, it can be quite inefficient because of a very low acceptance ratio, especially when takes small values.
Gibbs sampler
This sampling method is designed to sample more efficiently from the conditional distribution in
(14) when the considered frame is the union of orthonormal bases and
is the Euclidean norm. In this case, the analysis frame operator
and the corresponding adjoint can be written as
and , respectively, where , is the decomposition operator onto the
th orthonormal basis such as
.
Every with , can be decomposed as
where .
The Gibbs sampler for the generation of frame coefficients draws vectors according to the conditional distribution under the constraint , where is the reduced size vector of dimension built from by removing the th vector . If is the Euclidean norm, we have ,
(16) 
where
Having , it is thus easy to compute the vector . To sample each , we propose to use an MH step whose proposal distribution is supported on the ball defined by
(17) 
Random generation from a pdf defined on which has a simple expression is described in Appendix A. Having a closed form expression of this pdf is important to be able to calculate the acceptance ratio of the MH move. To take into account the value of obtained at the previous iteration , it may however be preferable to choose a proposal distribution supported on a restricted ball of radius containing . This strategy similar to the random walk MH algorithm [15, p. 287] results in a better exploration of regions associated with large values of the conditional distribution .
More precisely, we propose to choose a proposal distribution defined on , where and is the projection onto the ball defined as
(18) 
This choice of the center of the ball guarantees that . Moreover, any point of can be reached after consecutive draws in . Note that the radius has to be adjusted to ensure a good exploration of . In practice, it may also be interesting to fix a small enough value of so as to improve the acceptance ratio.
Remark:
Alternatively, a Gibbs sampler can be used to draw successively the
elements of under the
following constraint
where is the th element of the vector (see [41, p.133] for related strategies). However, this method is very timeconsuming since it proceeds sequentially for each component of the high dimensional vector .
IvA2 Sampling the hyperparameter vector
Instead of sampling according to , we propose to iteratively sample according to and . Straightforward calculations allow us to obtain the following results
(20)  
(21) 
Consequently, due to the new parameterization introduced in (9),
is the pdf of the inverse gamma distribution
that is easy to sample. Conversely, it is more difficult to sample according to the truncated pdf . This is achieved by using an MH move whose proposalis a Gaussian distribution truncated on the interval
with standard deviation
[42]. Note that the mode of this distribution is the value of the parameter at the previous iteration .The resulting method is the hybrid Gibbs sampler summarized in Algorithm .

Initialize with some and , and set .

Sampling
For to
Compute
and . 
Simulate as follows:

Generate where is defined on (see Appendix A).

Compute the ratio
and accept the proposed candidate with the probability .



Sampling
For to
Generate .

Simulate as follows:

Generate

Compute the ratio
and accept the proposed candidate with the probability .



Set and goto ➁ until convergence.
orthonormal bases. When this assumption does not hold, another algorithm proposed in the next section allows us to sample frame coefficients and the related hyperparameters by exploiting algebraic properties of frames.
Although this algorithm is intuitive and simple to implement, it must be pointed out that it was derived under the restrictive assumption that the considered frame is the union ofIvB Hybrid MH sampler using algebraic properties of frame representations
As a direct generation of samples according to is generally impossible, we propose here an alternative that replaces the Gibbs move by an MH move. This MH move aims at sampling globally a candidate according to a proposal distribution. This candidate is accepted or rejected with the standard MH acceptance ratio. The efficiency of the MH move strongly depends on the choice of the proposal distribution for . We denote as the th accepted sample of the algorithm and the proposal that is used to generate a candidate at iteration . The main difficulty for choosing stems from the fact that it must guarantee that (as mentioned in Section IIB) while yielding a tractable expression of .
For this reason, we propose to exploit the algebraic properties of frame representations. More precisely, any frame coefficient vector can be decomposed as , where and are realizations of random vectors taking their values in and , respectively.^{3}^{3}3We recall that the range of is and the null space of is . The proposal distribution used in this paper allows us to generate samples and . More precisely, the following separable form of the proposal pdf will be considered
(22) 
where , and . In other words, independent sampling of and will be performed.
If we consider the decomposition
, sampling
in is equivalent to sampling , where
is the inverse image of under .
Indeed, we can write
where and, since , . Sampling
in can be easily achieved, e.g., by
generating from a distribution supported on the ball
and by taking .
To make the sampling of at iteration more efficient, taking into account the sampled value at the previous iteration may be interesting.
Similarly to Section IVA1b), and to random walk generation techniques, we proceed by generating randomly in where and . This allows us
to draw a vector such that and . The generation of can then be performed as explained in
Appendix A provided that is an norm with .
Once we have simulated (which ensures that is in ), has to be sampled as an element of . Since , there is no information in about . As a consequence, and for simplicity reasons, we propose to sample by drawing according to the Gaussian distribution and by projecting onto , i.e.,
(23) 
where is the orthogonal projection operator onto .
Note here that using a tight frame makes the computation of both and much easier due to the relation .
Let us now derive the expression of the proposal pdf. It can be noticed that, if , there exists a linear operator from to which is semiorthogonal (i.e., ) and orthogonal to (i.e., ), such that
(24) 
and . Standard rules on bijective linear transforms of random vectors lead to
(25) 
where, due to the bijective linear mapping between and
(26) 
and is the pdf of the Gaussian distribution with mean . Recall that denotes a distribution on the ball as expressed in Appendix A. Due to the symmetry of the Gaussian distribution, it can be deduced that
(27) 
This expression remains valid in the degenerate case when
(yielding
). Finally, it is important to note
that, if can be chosen as a uniform distribution on the
ball , the above ratio reduces to
, which simplifies the computation of the MH acceptance ratio.
The final algorithm is summarized in Algorithm 2.
Note that the sampling of the hyperparameter vector is performed as
for the hybrid Gibbs sampler in Section IVA2.

Initialize with some and . Set and .

Sampling

Compute .

Generate where is defined on (see Appendix A).

Compute .

Generate .

Compute and .

Compute the ratio
and accept the proposed candidates and with probability .


Sampling
For to
Generate .

Simulate as follows

Generate

Compute the ratio
and accept the proposed candidate with the probability .



Set and goto ➁ until convergence.
Experimental estimation results and applications to some image denoising problems of the proposed stochastic sampling techniques are provided in the next section.
V Simulation Results
Va Validation experiments
VA1 Example 1
To show the effectiveness of our algorithm, a first set of experiments was carried out on synthetic images. As a frame representation, we used the union of two 2D separable wavelet bases and using Daubechies and shifted Daubechies filters of length 8 and 4, respectively. The norm was used for in (4) with . To generate a synthetic image, we synthesized wavelet frame coefficients from known prior distributions.
Let and be the sequences of wavelet basis coefficients generated in and , where stand for approximation, horizontal, vertical and diagonal coefficients and the index designates the resolution level. Wavelet frame coefficients have been generated from a GG distribution in accordance with the chosen priors. The coefficients in each subband have been modeled with the same values of the hyperparameters and , which means that each subband forms a group of index . The number of groups (i.e. the number of subbands) is therefore equal to . A uniform prior distribution over has been chosen for parameter whereas a Jeffrey’s prior has been assigned to each parameter .
After generating the hyperparameters from their prior distributions, a set of frame coefficients is randomly generated to synthesize the observed data. The hyperparameters are then supposed unknown, sampled using the proposed algorithm, and estimated by computing the mean of the generated samples according to the MMSE principle. Having reference values, the normalized mean square erors (NMSEs) related to the estimation of each hyperparameter belonging to a given group (here a given subband) have been computed from Monte Carlo runs. The NMSEs computed for the estimators associated with the two samplers of Sections IVA and IVB are reported in Table I.
NMSE  
Sampler 1  Sampler 2  
0.015  0.006  0.012  0.030  
0.022  0.021  0.022  0.026  
0.06  0.016  0.011  0.044  
0.04  0.003  0.021  0.026  
0.020  0.027  0.020  0.019  
0.013  0.016  0.023  0.041  
0.039  0.08  0.039  0.023  
0.015  0.030  0.015  0.025  
0.051  0.07  0.025  0.031  
0.027  0.039  0.029  0.023  
0.040  0.024  0.016  0.034  
0.08  0.019  0.013  0.022  
0.05  0.015  0.011  0.040  
0.010  0.064  0.010  0.028 
Table I shows that the proposed algorithms (using Sampler of Section IVA and Sampler of Section IVB) provide accurate estimates of the hyperparameters. The two samplers perform similarly for this experiment. However, one advantage of Sampler 2 is that it can be applied to different kinds of redundant frames, unlike Sampler 1. Indeed, as reported in Section IVA, the conditional distribution (14) is generally difficult to sample when the frame representation is not the union of orthonormal bases.
Two examples of empirical histograms of known reference wavelet frame coefficients (corresponding to ) and pdfs with estimated hyperparameters are plotted in Fig. 1 to illustrate the good performance of the estimator.
: ,  : , 

VA2 Example 2
In this experiment, another frame representation is considered, namely a tight frame version of the translation invariant wavelet transform [43] with Daubechies filters of length 8. The norm was also used for in (4) with
. Let denote the frame coefficients vector. We used the same process to generate frame coefficients as for Example 1. The coefficients in each subband (i.e. each group) have been modeled with the same values of the hyperparameters and , the number of groups being equal to . The same priors for the hyperparameters and as for Example 1 have been used.
After generating the hyperparameters and frame coefficients, the hyperparameters
are then supposed unknown, sampled using the proposed algorithm, and
estimated using the MMSE estimator.
Table II shows NMSEs based on reference values of each hyperparameter. Note that Sampler 1 is difficult to be implemented in this case because of the used frame properties. Consequently, only NMSE values for Sampler 2 have been reported in Table II.
NMSE  

0.05  0.027  
0.024  0.007  
0.05  0.014  
0.037  0.028  
0.051  0.044  
0.04  0.012  
0.04  0.05 
VB Convergence results
To be able to automatically stop the simulated chain and ensure that the last simulated samples are appropriately distributed according to the posterior distribution of interest, a convergence monitoring technique based on the potential scale reduction factor (PSRF) has been used by simulating several chains in parallel (see [44] for more details). Using the union of two orthonormal bases as a frame representation, Figs. 2 and 3 show examples of convergence profiles corresponding to the hyperparameters and when two chains are sampled in parallel using Sampler 2.
Based on these values of the PSRF, the algorithm was stopped after about iterations (burnin period of iterations), which corresponds to about hours of computational time using Matlab 7.7 on an Intel Core 4 ( GHz) architecture. When comparing the two proposed samplers in terms of convergence speed, it turns out from our simulations that Sampler 1 shows faster convergence than Sampler 2. Indeed, Sampler 1 needs about iterations to converge, which reduces the global computational time to about 3 hours.
Chain  Chain 
  
 
Chain  Chain 
  
 
Comments
There are no comments yet.