A Hierarchical Bayesian Model for Frame Representation

11/15/2009 ∙ by L. Chaâri, et al. ∙ CEA ENSEEIHT 0

In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality.



There are no comments yet.


page 22

page 23

page 24

page 25

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Data representation is a crucial operation in many signal and image processing applications. These applications include signal and image reconstruction [1, 2] , restoration [3, 4] and compression [5, 6]

. In this respect, many linear transforms have been proposed in order to obtain suitable signal representations in other domains than the original spatial or temporal ones. The traditional Fourier and discrete cosine transforms provide a good frequency localization, but at the expense of a poor spatial or temporal localization. To improve localization both in the spatial/temporal and frequency domains, the wavelet transform (WT) was introduced as a powerful tool in the

’s [7]. Many wavelet-like basis decompositions have been subsequently proposed offering different features. For instance, we can mention the wavelet packets [8] or the grouplet bases [9]. To further improve signal representations, redundant linear decomposition families called frames have become the focus of many works during the last decade. For the sake of clarity, it must be pointed out that the term frame [10] is understood in the sense of Hilbert space theory and not in the sense of some recent works like [11].
The main advantage of frames lies in their flexibility to capture local features of the signal. Hence, they may result in sparser representations as shown in the literature on curvelets [10], bandelets [12] or dual-trees [13] in image processing. However, a major difficulty when using frame representations in a statistical framework is to estimate the parameters of the frame coefficient probability distribution. Actually, since frame synthesis operators are generally not injective, even if the signal is perfectly known, the determination of its frame coefficients is an underdetermined problem.

This paper studies a hierarchical Bayesian approach to estimate the frame coefficients and their hyper-parameters. Although this approach is conceptually able to deal with any desirable distribution for the frame coefficients, we focus in this paper on generalized Gaussian (GG) priors. Note however that we do not restrict our attention to log-concave GG prior probability density functions (pdf), which may be limited for providing accurate models of sparse signals

[14]. In addition, the proposed method can be applied to noisy data when imprecise measurements of the signal are only available. Our work takes advantage of the current developments in Markov Chain Monte Carlo (MCMC) algorithms [15, 16, 17] that have already been investigated for instance in image separation [18], image restoration [19] and brain activity detection in functional MRI [20, 21]. These algorithms have also been investigated for signal/image processing problems with sparsity constraints. These constraints may be imposed in the original space like in [22], where a sparse image reconstruction problem is assessed in the image domain. They may also be imposed on some redundant representation of the signal like in [23], where a time-series sparse coding problem is addressed.

Hybrid MCMC algorithms [24, 25] are designed combining Metropolis-Hastings (MH) [26] and Gibbs [27] moves to sample according to the posterior distribution of interest. MCMC algorithms and WT have been jointly investigated in some works dealing with signal denoising in a Bayesian framework [28, 29, 30, 18]. However, in contrast with the present work where overcomplete frame representations are considered, these works are limited to wavelet bases for which the hyper-parameter estimation problem is much easier to handle.
This paper is organized as follows. Section II presents a brief overview on the concepts of frame and frame representation. The hierarchical Bayesian model proposed for frame representation is introduced in Section III. Two algorithms for sampling the posterior distribution are proposed in Section IV. To illustrate the effectiveness of these algorithms, experiments on both synthetic and real world data are presented in Section V. In this section, applications to image recovery problems are also considered. Finally some conclusions are drawn in Section VI.

Ii Problem Formulation

Ii-a The frame concept

In the following, we will consider real-valued digital signals of length as elements of the Euclidean space endowed with the usual scalar product and norm denoted as and , respectively. Let be an integer greater than or equal to

. A family of vectors

in the finite-dimensional space is a frame when there exists a constant in such that111The classical upper bound condition is always satisfied in finite dimension. In this case, the frame condition is also equivalent to saying that the frame operator has full rank .


If the inequality (1) becomes an equality, is called a tight frame. The bounded linear frame analysis operator and the adjoint synthesis frame operator are defined as


Note that is injective whereas is surjective. When , is an orthonormal basis. A simple example of a redundant frame is the union of orthonormal bases. In this case, the frame is tight with and thus, we have where is the identity operator.

Ii-B Frame representation

An observed signal can be written according to its frame representation (FR) involving coefficients as follows


where is the error between the observed signal and its FR . This error is modeled by imposing that belongs to the closed convex set


where is some error bound and can be any norm on .
In signal/image recovery problems, is nothing but an additive noise that corrupts the measured data. By adopting a probabilistic approach, and are assumed to be realizations of random vectors and . In this context, our goal is to characterize the probability distribution of , by considering some parametric probabilistic model and by estimating the associated hyper-parameters.
A useful example where this characterization may be of great interest is frame-based signal/image denoising in a Bayesian framework. Actually, denoising in the wavelet domain using wavelet frame decompositions has already been investigated since the seminal work [31] as this kind of representation provides sparse description of regular signals. The related hyper-parameters have then to be estimated.

When is bijective and , this estimation can be performed by inverting the transform so as to deduce from and by resorting to standard estimation techniques on . However, as mentioned in Section II-A, for redundant frames, is not bijective, which makes the hyper-parameter estimation problem more difficult since deducing from is no longer unique. This paper presents hierarchical Bayesian algorithms to address this issue.

Iii Hierarchical Bayesian Model

In a Bayesian framework, we first need to define prior distributions for the frame coefficients. For instance, this prior may be chosen so as to promote the sparsity of the representation. In the following, denotes the pdf of the frame coefficients that depends on an unknown hyper-parameter vector and is the a priori pdf of the hyper-parameter vector . In compliance with the observation model (4),

is a uniform distribution on the closed convex set

defined as


where . Denoting by

the random variable associated with the hyper-parameter vector

and using the hierarchical structure between and , the conditional distribution of given can be written as


where means proportional to.

In this work, we assume that frame coefficients are a priori independent with marginal GG distributions. This assumption has been successfully used in many studies [32, 33, 34, 35, 36, 37] and leads to the following frame coefficient prior


where (with ) are the scale and shape parameters associated with , which is the th component of the frame coefficient vector and is the Gamma function. Note that small values of the shape parameters are appropriate for modelling sparse signals. When , , a Laplace prior is obtained, which was shown to play a central role in sparse signal recovery [38] and compressed sensing [39].

By introducing , , the frame prior can be rewritten as222The interest of this new parameterization will be clarified in Section IV.


The distribution of a frame coefficient generally differs from one coefficient to another. However, some frame coefficients can have very similar distributions (that can be defined by the same hyper-parameters and ). As a consequence, we propose to split the frame coefficients into different groups. The th group will be parameterized by a unique hyper-parameter vector denoted as (after the reparameterization mentioned above). In this case, the frame prior can be expressed as


where the summation covers the index set of the elements of the th group containing elements and . Note that in our simulations, each group will correspond to a given wavelet subband. A coarser classification may be made when using multiscale frame representations by considering that all the frame coefficients at a given resolution level belong to a same group.

The hierarchical Bayesian model for the frame decomposition is completed by the following improper hyperprior


where for a set ,


The motivations for using this kind of prior are summarized below:

  • the interval covers all possible values of encountered in practical applications. Moreover, there is no additional information about the parameter .

  • The prior for the parameter is a Jeffrey’s distribution that reflects the absence of knowledge about this parameter [40]. This kind of prior is often used for scale parameters.

The resulting posterior distribution is therefore given by


The Bayesian estimators (e.g., the maximum a posteriori (MAP) or minimum mean square error (MMSE) estimators) associated with the posterior distribution (13) have no simple closed-form expression. The next section studies different sampling strategies for generating samples distributed according to the posterior distribution (13). The generated samples will be used to estimate the unknown model parameter and hyper-parameter vectors and .

Iv Sampling strategies

This section proposes different MCMC methods to generate samples distributed according to the posterior defined in (13).

Iv-a Hybrid Gibbs Sampler

A very standard strategy to sample according to (7) is provided by the Gibbs sampler. The Gibbs sampler iteratively generates samples distributed according to conditional distributions associated with the target distribution. More precisely, the basic Gibbs sampler iteratively generates samples distributed according to and .

Iv-A1 Sampling the frame coefficients

Straightforward calculations yield the following conditional distribution


where is defined in (5). This conditional distribution is a product of GG distributions truncated on . Actually, sampling according to this truncated distribution is not always easy to perform since the adjoint frame operator is usually of large dimension. However, two alternative sampling strategies are detailed in what follows.

Naive sampling

This sampling method proceeds by sampling according to independent GG distributions


and then accepting the proposed candidate only if . This method can be used for any frame decomposition and any norm. However, it can be quite inefficient because of a very low acceptance ratio, especially when takes small values.

Gibbs sampler

This sampling method is designed to sample more efficiently from the conditional distribution in (14) when the considered frame is the union of orthonormal bases and is the Euclidean norm. In this case, the analysis frame operator and the corresponding adjoint can be written as and , respectively, where , is the decomposition operator onto the th orthonormal basis such as .
Every with , can be decomposed as where .

The Gibbs sampler for the generation of frame coefficients draws vectors according to the conditional distribution under the constraint , where is the reduced size vector of dimension built from by removing the th vector . If is the Euclidean norm, we have ,



Having , it is thus easy to compute the vector . To sample each , we propose to use an MH step whose proposal distribution is supported on the ball defined by


Random generation from a pdf defined on which has a simple expression is described in Appendix A. Having a closed form expression of this pdf is important to be able to calculate the acceptance ratio of the MH move. To take into account the value of obtained at the previous iteration , it may however be preferable to choose a proposal distribution supported on a restricted ball of radius containing . This strategy similar to the random walk MH algorithm [15, p. 287] results in a better exploration of regions associated with large values of the conditional distribution .

More precisely, we propose to choose a proposal distribution defined on , where and is the projection onto the ball defined as


This choice of the center of the ball guarantees that . Moreover, any point of can be reached after consecutive draws in . Note that the radius has to be adjusted to ensure a good exploration of . In practice, it may also be interesting to fix a small enough value of so as to improve the acceptance ratio.

Alternatively, a Gibbs sampler can be used to draw successively the elements of under the following constraint

where is the th element of the vector (see [41, p.133] for related strategies). However, this method is very time-consuming since it proceeds sequentially for each component of the high dimensional vector .

Iv-A2 Sampling the hyper-parameter vector

Instead of sampling according to , we propose to iteratively sample according to and . Straightforward calculations allow us to obtain the following results


Consequently, due to the new parameterization introduced in (9),

is the pdf of the inverse gamma distribution

that is easy to sample. Conversely, it is more difficult to sample according to the truncated pdf . This is achieved by using an MH move whose proposal

is a Gaussian distribution truncated on the interval

with standard deviation

[42]. Note that the mode of this distribution is the value of the parameter at the previous iteration .

The resulting method is the hybrid Gibbs sampler summarized in Algorithm .

  • Initialize with some and , and set .

  • Sampling
    For to

    • Compute
      and .

    • Simulate as follows:

      • Generate where is defined on (see Appendix A).

      • Compute the ratio

        and accept the proposed candidate with the probability .

  • Sampling
    For to

    • Generate .

    • Simulate as follows:

      • Generate

      • Compute the ratio

        and accept the proposed candidate with the probability .

  • Set and goto ➁ until convergence.

Proposed Hybrid Gibbs sampler to simulate according to (superscript indicates values computed at iteration number ). Although this algorithm is intuitive and simple to implement, it must be pointed out that it was derived under the restrictive assumption that the considered frame is the union of orthonormal bases. When this assumption does not hold, another algorithm proposed in the next section allows us to sample frame coefficients and the related hyper-parameters by exploiting algebraic properties of frames.

Iv-B Hybrid MH sampler using algebraic properties of frame representations

As a direct generation of samples according to is generally impossible, we propose here an alternative that replaces the Gibbs move by an MH move. This MH move aims at sampling globally a candidate according to a proposal distribution. This candidate is accepted or rejected with the standard MH acceptance ratio. The efficiency of the MH move strongly depends on the choice of the proposal distribution for . We denote as the th accepted sample of the algorithm and the proposal that is used to generate a candidate at iteration . The main difficulty for choosing stems from the fact that it must guarantee that (as mentioned in Section II-B) while yielding a tractable expression of .

For this reason, we propose to exploit the algebraic properties of frame representations. More precisely, any frame coefficient vector can be decomposed as , where and are realizations of random vectors taking their values in and , respectively.333We recall that the range of is and the null space of is . The proposal distribution used in this paper allows us to generate samples and . More precisely, the following separable form of the proposal pdf will be considered


where , and . In other words, independent sampling of and will be performed.

If we consider the decomposition , sampling in is equivalent to sampling , where is the inverse image of under .
Indeed, we can write where and, since , . Sampling in can be easily achieved, e.g., by generating from a distribution supported on the ball and by taking .
To make the sampling of at iteration more efficient, taking into account the sampled value at the previous iteration may be interesting. Similarly to Section IV-A1b), and to random walk generation techniques, we proceed by generating randomly in where and . This allows us to draw a vector such that and . The generation of can then be performed as explained in Appendix A provided that is an norm with .

Once we have simulated (which ensures that is in ), has to be sampled as an element of . Since , there is no information in about . As a consequence, and for simplicity reasons, we propose to sample by drawing according to the Gaussian distribution and by projecting onto , i.e.,


where is the orthogonal projection operator onto .

Note here that using a tight frame makes the computation of both and much easier due to the relation .

Let us now derive the expression of the proposal pdf. It can be noticed that, if , there exists a linear operator from to which is semi-orthogonal (i.e., ) and orthogonal to (i.e., ), such that


and . Standard rules on bijective linear transforms of random vectors lead to


where, due to the bijective linear mapping between and


and is the pdf of the Gaussian distribution with mean . Recall that denotes a distribution on the ball as expressed in Appendix A. Due to the symmetry of the Gaussian distribution, it can be deduced that


This expression remains valid in the degenerate case when (yielding ). Finally, it is important to note that, if can be chosen as a uniform distribution on the ball , the above ratio reduces to , which simplifies the computation of the MH acceptance ratio.
The final algorithm is summarized in Algorithm 2. Note that the sampling of the hyper-parameter vector is performed as for the hybrid Gibbs sampler in Section IV-A2.

  • Initialize with some and . Set and .

  • Sampling

    • Compute .

    • Generate where is defined on (see Appendix A).

    • Compute .

    • Generate .

    • Compute and .

    • Compute the ratio

      and accept the proposed candidates and with probability .

  • Sampling
    For to

    • Generate .

    • Simulate as follows

      • Generate

      • Compute the ratio

        and accept the proposed candidate with the probability .

  • Set and goto ➁ until convergence.

Proposed Hybrid MH sampler using algebraic properties of frame representations to simulate according to .

Experimental estimation results and applications to some image denoising problems of the proposed stochastic sampling techniques are provided in the next section.

V Simulation Results

V-a Validation experiments

V-A1 Example 1

To show the effectiveness of our algorithm, a first set of experiments was carried out on synthetic images. As a frame representation, we used the union of two 2D separable wavelet bases and using Daubechies and shifted Daubechies filters of length 8 and 4, respectively. The norm was used for in (4) with . To generate a synthetic image, we synthesized wavelet frame coefficients from known prior distributions.

Let and be the sequences of wavelet basis coefficients generated in and , where stand for approximation, horizontal, vertical and diagonal coefficients and the index designates the resolution level. Wavelet frame coefficients have been generated from a GG distribution in accordance with the chosen priors. The coefficients in each subband have been modeled with the same values of the hyper-parameters and , which means that each subband forms a group of index . The number of groups (i.e. the number of subbands) is therefore equal to . A uniform prior distribution over has been chosen for parameter whereas a Jeffrey’s prior has been assigned to each parameter .

After generating the hyper-parameters from their prior distributions, a set of frame coefficients is randomly generated to synthesize the observed data. The hyper-parameters are then supposed unknown, sampled using the proposed algorithm, and estimated by computing the mean of the generated samples according to the MMSE principle. Having reference values, the normalized mean square erors (NMSEs) related to the estimation of each hyper-parameter belonging to a given group (here a given subband) have been computed from Monte Carlo runs. The NMSEs computed for the estimators associated with the two samplers of Sections IV-A and IV-B are reported in Table I.

Sampler 1 Sampler 2
0.015 0.006 0.012 0.030
0.022 0.021 0.022 0.026
0.06 0.016 0.011 0.044
0.04 0.003 0.021 0.026
0.020 0.027 0.020 0.019
0.013 0.016 0.023 0.041
0.039 0.08 0.039 0.023
0.015 0.030 0.015 0.025
0.051 0.07 0.025 0.031
0.027 0.039 0.029 0.023
0.040 0.024 0.016 0.034
0.08 0.019 0.013 0.022
0.05 0.015 0.011 0.040
0.010 0.064 0.010 0.028
TABLE I: NMSEs for the estimated hyper-parameters (30 runs).

Table I shows that the proposed algorithms (using Sampler of Section IV-A and Sampler of Section IV-B) provide accurate estimates of the hyper-parameters. The two samplers perform similarly for this experiment. However, one advantage of Sampler 2 is that it can be applied to different kinds of redundant frames, unlike Sampler 1. Indeed, as reported in Section IV-A, the conditional distribution (14) is generally difficult to sample when the frame representation is not the union of orthonormal bases.

Two examples of empirical histograms of known reference wavelet frame coefficients (corresponding to ) and pdfs with estimated hyper-parameters are plotted in Fig. 1 to illustrate the good performance of the estimator.

: , : ,
Fig. 1: Examples of empirical approximation (left) and detail (right) histograms and pdfs of frame coefficients corresponding to a synthetic image.

V-A2 Example 2

In this experiment, another frame representation is considered, namely a tight frame version of the translation invariant wavelet transform [43] with Daubechies filters of length 8. The norm was also used for in (4) with . Let denote the frame coefficients vector. We used the same process to generate frame coefficients as for Example 1. The coefficients in each subband (i.e. each group) have been modeled with the same values of the hyper-parameters and , the number of groups being equal to . The same priors for the hyper-parameters and as for Example 1 have been used.
After generating the hyper-parameters and frame coefficients, the hyper-parameters are then supposed unknown, sampled using the proposed algorithm, and estimated using the MMSE estimator. Table II shows NMSEs based on reference values of each hyper-parameter. Note that Sampler 1 is difficult to be implemented in this case because of the used frame properties. Consequently, only NMSE values for Sampler 2 have been reported in Table II.

0.05 0.027
0.024 0.007
0.05 0.014
0.037 0.028
0.051 0.044
0.04 0.012
0.04 0.05
TABLE II: NMSEs for the estimated hyper-parameters using Sampler 2 (30 runs).

V-B Convergence results

To be able to automatically stop the simulated chain and ensure that the last simulated samples are appropriately distributed according to the posterior distribution of interest, a convergence monitoring technique based on the potential scale reduction factor (PSRF) has been used by simulating several chains in parallel (see [44] for more details). Using the union of two orthonormal bases as a frame representation, Figs. 2 and 3 show examples of convergence profiles corresponding to the hyper-parameters and when two chains are sampled in parallel using Sampler 2.

Based on these values of the PSRF, the algorithm was stopped after about iterations (burn-in period of iterations), which corresponds to about hours of computational time using Matlab 7.7 on an Intel Core 4 ( GHz) architecture. When comparing the two proposed samplers in terms of convergence speed, it turns out from our simulations that Sampler 1 shows faster convergence than Sampler 2. Indeed, Sampler 1 needs about iterations to converge, which reduces the global computational time to about 3 hours.

Chain Chain
Fig. 2: Ground truth values and sample path for the hyper-parameters and related to in .
Chain Chain
Fig. 3: Ground truth values and sample path for the hyper-parameters and related to in