Generalized sparse Bayesian learning and application to image reconstruction

Image reconstruction based on indirect, noisy, or incomplete data remains an important yet challenging task. While methods such as compressive sensing have demonstrated high-resolution image recovery in various settings, there remain issues of robustness due to parameter tuning. Moreover, since the recovery is limited to a point estimate, it is impossible to quantify the uncertainty, which is often desirable. Due to these inherent limitations, a sparse Bayesian learning approach is sometimes adopted to recover a posterior distribution of the unknown. Sparse Bayesian learning assumes that some linear transformation of the unknown is sparse. However, most of the methods developed are tailored to specific problems, with particular forward models and priors. Here, we present a generalized approach to sparse Bayesian learning. It has the advantage that it can be used for various types of data acquisitions and prior information. Some preliminary results on image reconstruction/recovery indicate its potential use for denoising, deblurring, and magnetic resonance imaging.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

10/06/2008

Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models

Many problems of low-level computer vision and image processing, such as...
12/04/2015

Computational Imaging for VLBI Image Reconstruction

Very long baseline interferometry (VLBI) is a technique for imaging cele...
03/15/2013

Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM

We present a variational Bayesian method of joint image reconstruction a...
05/14/2019

Reconstruction-Aware Imaging System Ranking by use of a Sparsity-Driven Numerical Observer Enabled by Variational Bayesian Inference

It is widely accepted that optimization of imaging system performance sh...
12/30/2014

Holistic random encoding for imaging through multimode fibers

The input numerical aperture (NA) of multimode fiber (MMF) can be effect...
05/26/2021

Sparse recovery based on the generalized error function

In this paper, we propose a novel sparse recovery method based on the ge...
09/16/2020

Deep Learning in Photoacoustic Tomography: Current approaches and future directions

Biomedical photoacoustic tomography, which can provide high resolution 3...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many applications seek to solve the linear inverse problem

(1)

where

is a vector of indirect measurements,

is the vector of unknowns, is a known linear forward operator, and corresponds to a typically unknown noise vector (see [groetsch1993inverse, vogel2002computational, hansen2010discrete] and references therein). In particular, Eq. 1 can be associated with signal or image reconstruction [natterer2001mathematical, hansen2006deblurring, stark2013image]. In this regard it is often reasonable to assume that some linear transformation of the unknown solution , say , is sparse. A common approach is to consider the -regularized inverse problem

(2)

where is referred to as the regularization operator and as the regularization parameter. The motivation for this approach is that the -norm, , serves as a convex surrogate for the -“norm”, . Thus, Eq. 2 balances data fidelity, noise, and the sparsity assumption on , while still enabling efficient computations [donoho2006compressed, eldar2012compressed, foucart2017mathematical]. However, an often encountered difficulty for the -regularized inverse problem Eq. 2 is the selection of an appropriate regularization parameter . This parameter can critically influence the quality of the regularized reconstruction [golub1979generalized, colton1997simple, hansen1999curve, sanders2020effective, lanza2020residual]. In part for this reason, many statistical approaches have been proposed for regularized inverse problems [kaipio2006statistical, calvetti2007introduction, stark2013image]. Another advantage in using statistical approaches is that they may allow for uncertainty quantification in the reconstructed solution. For example the hierarchical Bayesian formulation of the -regularized inverse problem Eq. 2, see [kaipio2006statistical, calvetti2007introduction], is based on extending , , and all other involved parameters, which we collectively write as

, into random variables. Consequently

, , and are characterized by certain density functions, as are their relationships to each other. In particular, one usually considers the following density functions:

  • The likelihood

    , which is the probability density function for

    given and .

  • The prior , which is the density function for given .

  • The hyper-prior

    , which is the probability density function for the parameters

    .

  • The posterior , which is the probability density function for the solution and the parameters given the data .

One can use Bayes’ theorem to obtain

(3)

where “” means that the two sides of Eq. 3 are equal to each other up to a multiplicative constant that does not depend on or . Note that the parameters are now part of the problem and are no longer selected a priori. Furthermore, using an appropriate method for Bayesian inference allows to quantify uncertainty in the reconstructed solution .

There are a variety of sparsity-promoting priors to choose from, including but not limited to Laplace priors [figueiredo2007majorization], TV-priors [kaipio2000statistical, babacan2008parameter], mixture-of-Gaussian priors [fergus2006removing], and hyper-Laplacian distributions based on -quasinorms with [levin2007image, krishnan2009fast]. In this investigation we consider the well-known class of conditionally Gaussian priors given by

(4)

where is a diagonal inverse covariance matrix. Ideas discussed in [tipping2001sparse, wipf2004sparse, chantas2006bayesian, calvetti2007gaussian, chantas2008variational, bardsley2010hierarchical, babacan2010sparse, calvetti2019hierachical, churchill2019detecting, churchill2020estimation] suggest that conditionally Gaussian priors of the form Eq. 4 are particularly suited to promote sparsity of . For example, the model proposed in [tipping2001sparse] is designed to recover sparse representations of kernel approximations, coining the term sparse Bayesian learning (SBL). Promoting sparse solutions, as done in [tipping2001sparse], corresponds to using as a regularization operator in Eq. 4. Further investigations that made use of SBL to promote sparse solutions include [wipf2004sparse, zhang2011sparse, zhang2014spatiotemporal, calvetti2019hierachical, churchill2019detecting]. In many applications, however, it is some linear transformation that is desired to be sparse. For example, total variation (TV) regularization is of particular interest in image recovery. Extensions of SBL for this setting have been proposed in [chantas2006bayesian, calvetti2007gaussian, chantas2008variational, bardsley2010hierarchical, babacan2010sparse, churchill2020estimation]. That said, since the TV-regularization operator is singular, the prior Eq. 4 is improper. This prohibits the application of many of the existing SBL approaches. An often encountered idea therefore is to make with invertible by introducing additional rows that are consistent with assumptions about the underlying solution. For example, in [calvetti2007gaussian, bardsley2010hierarchical] the additional rows encode certain boundary conditions. The same technique can be extended to higher-order TV-regularization [churchill2020estimation]. Unfortunately, such additional information might not always be available or may be complicated to incorporate, especially in two or more dimensions. Further, different types of regularization operators must be adapted on a case-by-case basis, and the resulting prior may promote undesired artificial features in the solution when the regularization operator is not carefully modified. The approach in [chantas2006bayesian, chantas2008variational], by contrast, depends on the assumption of a “commuting property” of the form . Requiring such a commuting property is often unrealistic in applications, however.111The dimensions of and are typically not consistent.

Our Contribution

To address these issues, we present a generalized approach to SBL for “almost” general forward and regularization operators, and . By “almost” general, we mean that the only restriction on and is that their common kernel should be trivial, , a standard assumption in regularized inverse problems [kaipio2006statistical]. We propose an efficient numerical method for Bayesian inference that yields a full conditional posterior density , rather than a simple point estimate, which allows for uncertainty quantification in the solution . The present work implies that SBL can be applied to a broader class of problems than currently known. In particular, some preliminary results on signal and image reconstruction indicate its potential use for denoising, deblurring, and magnetic resonance imaging. Finally, we relax the usual assumption that in Eq. 1 corresponds to independent and identically distributed (i. i. d.) normal noise, as done in [tipping2001sparse, wipf2004sparse, chantas2006bayesian, calvetti2007gaussian, chantas2008variational, bardsley2010hierarchical, babacan2010sparse, churchill2019detecting, churchill2020estimation]

. In contrast, we allow for more degrees of freedom in the noise by only restricting

to be independent but not necessarily identically distributed. In this way our generalized SBL approach is well-suited for data fusion problems.

Outline

The rest of this paper is organized as follows. Section 2 provides some details on the sparsity promoting hierarchical Bayesian model under consideration. In Section 3, we propose an efficient numerical method for Bayesian inference. A series of numerical examples is presented in Section 4 to illustrate the descriptive span of the hierarchical Bayesian model. Finally in Section 5 we provide some concluding thoughts.

2 The hierarchical Bayesian model

 

 

Figure 1: Graphical representation of the hierarchical Bayesian model. Nodes denoted within circles correspond to random variables, while nodes without a circle correspond to parameters. Shaded circles represent observed random variables, while plain circles represent hidden random variables.

We begin by reviewing the generalized hierarchical Bayesian model considered here, which is illustrated in Fig. 1.

2.1 The likelihood

The likelihood function models the connection between the solution , the noise parameters , and the indirect measurements . It is often assumed that in Eq. 1

is zero-mean i. i. d. normal noise with inverse variance

, that is, . This assumption yields the conditionally Gaussian likelihood function

(5)

The likelihood function given by Eq. 5 was considered, for instance, in [tipping2001sparse, chantas2006bayesian, babacan2010sparse, bardsley2012mcmc, churchill2020estimation]. By contrast, we restrict to be independent but not necessarily identically distributed. This translates into with diagonal positive definite inverse noise covariance matrix

(6)

The linear data model Eq. 1 then yields the generalized likelihood function

(7)

Observe that Eq. 7 reduces to Eq. 5 if the inverse variances are all equal to . Example 1 below motivates the weaker assumption in the context of data fusion [gustafsson2010statistical] and multi-sensor acquisition systems [chun2017compressed] such as parallel magnetic resonance imaging [guerquin2011realistic].

Example 1

Assume we have a collection of measurements , , generated from the same source from different sensors. The corresponding data models are

(8)

Further, assume that the noise in the measurements from the same sensor is i. i. d., that is, . However, the noise variance might differ from sensor to sensor, so that . If we combine the different measurements and consider the joint data model

(9)

the stacked noise vector cannot be assumed to be i. i. d., which we cannot appropriately model using the likelihood function Eq. 5. However, using the more general likelihood function Eq. 7, we can model Eq. 9 by choosing a diagonal inverse noise covariance matrix of the form

(10)

where ,

, denotes the identity matrix with dimensions matching the number of measurements provided by the

th sensor.

2.2 The prior

The prior function models our prior belief about the unknown solution . Assume that some linear transformation of , say , is sparse. The SBL approach promotes this assumed sparsity by using a conditionally Gaussian prior function,

(11)

where is referred to as the inverse prior convariance matrix. See [tipping2001sparse, wipf2004sparse, chantas2006bayesian, calvetti2007gaussian, chantas2008variational, bardsley2010hierarchical, babacan2010sparse, calvetti2019hierachical, churchill2019detecting, churchill2020estimation] and references therein. The conditionally Gaussian prior Eq. 11 can be justified by its asymptotic behavior [calvetti2007gaussian]. If we assume that the inverse variances are all equal, then Eq. 11 favors solutions for which is equal or close to zero,222For this prior, being close to zero means that has a small (unweighted) -norm, . since these solutions have a higher probability. For instance, when corresponds to the total variation of , , then Eq. 11 promotes solutions that have no or little variation. However, if one of the inverse variances, say , is significantly smaller than the remaining ones, a jump between and becomes more likely. In this way, Eq. 11 promotes sparsity of .

Remark 1 (Improper priors)

If , then is singular and Eq. 11 becomes an improper prior. Most existing SBL algorithms are infeasible in this case, thus motivating us to propose an alternative method in Section 3. In particular, the resulting difficulties for the evidence approach, which was used in the original investigation [tipping2001sparse] and later in [babacan2010sparse], are addressed in Appendix A.

2.3 The hyper-prior

From the discussion above it is evident that the inverse variances must be allowed to have distinctly different values for the conditionally Gaussian prior Eq. 11 to promote sparsity of . This can be achieved by treating

as random variables with uninformative density functions. A common choice is the Gamma distribution with probability density function

(12)

where and are positive shape and rate parameters. Furthermore, on the right-hand side of Eq. 12 denotes the usual Gamma function [artin2015gamma]. Note that a Gamma-distributed random variable, , respectively has mean and variance . In particular, and implies , making Eq. 12 an uninformative prior. We therefore choose the inverse noise and prior variances, and , to be Gamma distributed:

(13)

By setting and , and are free from the moderating influence of the hyper-prior and allowed to “vary wildly” following the data. In our numerical tests we used for all one-dimensional problems (signals) and for all two-dimensional problems (images), which is similar to the choices in [tipping2001sparse, babacan2010sparse, bardsley2012mcmc]. Future investigations will elaborate on the influence of these parameters. A few remarks are in order.

Remark 2 (Conjugate hyper-priors)

Choosing the hyper-priors and as Gamma distributions is convenient since the Gamma distribution is a conjugate333Recall that is a conjugate for if the posterior is in the same class of densities (in this case corresponding to Gamma distributions) as . (see [gelman1995bayesian, fink1997compendium, murphy2007conjugate]

) for the conditionally Gaussian distributions

Eq. 7 and Eq. 11.

Remark 3 (Informative hyper-priors)

For simplicity we use the same hyper-prior and parameters for all components of , . If one has a reasonable a priori notion of what or should be, the choice for hyper-prior could be modified correspondingly [bardsley2012mcmc, calvetti2019hierachical].

Remark 4 (Generalized Gamma hyper-priors)

The use of generalized Gamma distributions was recently investigated in [calvetti2020sparse]. Although generalized Gamma hyper-priors were demonstrated to promote sparsity more strongly in some cases, to not exceed the scope of the present work, we limit our discussion to usual Gamma hyper-priors.

3 Bayesian inference

We now propose a Bayesian inference method for the generalized hierarchical Bayesian model from Section 2.

3.1 Preliminary observations

The conditionally Gaussian prior Eq. 11 and the Gamma hyper-prior Eq. 12 were intentionally chosen because of their conjugacy relationship. Some especially important implications include the following (see [gelman1995bayesian]):

(14)
(15)
(16)

Here the covariance matrix and the mean in Eq. 14 are respectively given by

(17)

denotes the th entry of the vector , and denotes the th entry of the vector . Note that the two sides of Eq. 14, Eq. 15, and Eq. 16 are equal up to a multiplicative constant that does not depend on , , and , respectively. Finally, we stress that Eq. 14 only holds if the forward operator and the regularization operator satisfy the common kernel condition:

(18)

which is a standard assumption in regularized inverse problems [kaipio2006statistical, tikhonov2013numerical]. Indeed, Eq. 18 can be interpreted as the prior (regularization) introducing a sufficient amount of complementary information to the likelihood (the given measurements) to make the problem well-posed. This indicates that the hierarchical Bayesian model proposed in Section 2 does not require to be invertible as long as Eq. 18 is satisfied.

3.2 Proposed method: Bayesian coordinate descent

We are now in a position to formulate a Bayesian inference method for the generalized hierarchical Bayesian model from Section 2. This method is motivated by the coordinate descent approaches [friedman2010regularization, wright2015coordinate] and solves for a descriptive quantity (mode, mean, variance, etc.) of the posterior density function by alternatingly updating this quantity for , , and . Henceforth we refer to this method as the Bayesian coordinate descent (BCD) algorithm .

Assume that we are interested in the expected value (mean) of the posterior, . The BCD algorithm for this case is described in Algorithm 1.

1:  Initialize , , and
2:  repeat
3:     Update by setting
4:     Update by setting
5:     Update by setting
6:     Increase
7:  until convergence or maximum number of iterations is reached
Algorithm 1 BCD algorithm for the mean

In Algorithm 1 and henceforth, all variables with superscripts are treated as fixed parameters. That is, the expected values in Algorithm 1 are respectively computed w. r. t. , , and . Algorithm 1 is simple to implement because of the particular decomposition of the posterior density function provided by Bayes’ theorem (see Eq. 3):

(19)

By Eqs. 16, 15, and 14, we therefore have

(20)
(21)
(22)

where the covariance matrix and the mean are given as in Eq. 17 with and . Thus, the update step for in Algorithm 1 reduces to solving the linear system

(23)

for the mean , and the subsequent update steps for and yield

(24)
(25)

respectively. Hence, Algorithm 1 consists of alternating between Eqs. 25, 24, and 23.

Remark 5

For i. i. d. noise, that is, the likelihood function is Eq. 5 rather than Eq. 7, the linear system Eq. 23 will be simplified to

(26)

and the update step Eq. 24 correspondingly reduces to

(27)

If the common kernel condition Eq. 18 is satisfied, then the coefficient matrix on the left-hand side of Eq. 23 is symmetric and positive definite (SPD). For sufficiently small problems, Eq. 23 can therefore be solved efficiently using a preconditioned conjugate gradient (PCG) method [saad2003iterative]. However, the coefficient matrix may become prohibitively large in some cases. To avoid any potential storage and computational issues, we implemented our method using gradient descent for the imaging problems described in Section 4. Further details regarding this approach are provided below.

Let and be the SPD coefficient matrix and the right-hand side of the linear system Eq. 23, respectively. The solution of Eq. 23 then corresponds to the unique minimizer of the quadratic functional

(28)

For this functional, line search minimization can be performed analytically to find the locally optimal step size in every iteration. This allows us to use the classical gradient descent method described in Algorithm 2 to approximate the solution of Eq. 23.

1:  Set
2:  repeat
3:     Compute according to Eq. 34
4:     Compute the step size:
5:     Update the solution:
6:     Update the difference:
7:  until convergence or maximum number of iterations is reached
Algorithm 2 Gradient descent method

It is important to note that the gradient in Eq. 28 can be computed efficiently and without having to store the whole coefficient matrix , which might be prohibitively large. To show this, assume that the unknown solution corresponds to a vectorized matrix and that the forward operator corresponds to applying the same one-dimensional forward operator to the matrix in - and -direction:

(29)

where , , and . We furthermore assume that the regularization operator is defined by

(30)

which corresponds to anisotropic regularization. Using some basic properties of the Kronecker product and the element-wise Hadamard product , it can be shown that

(31)
(32)
(33)

where , , and are such that , , and , with . Combining Eqs. 33, 32, and 31 yields

(34)

and therefore

(35)

Observe that all of the matrices in Eqs. 35 and 34 are significantly smaller than and .

3.3 Uncertainty quantification

The proposed BCD algorithm has the advantage of allowing for uncertainty quantification in the reconstructed solution . For fixed and , Bayes’ theorem and the conjugacy relationship Eq. 14 yield

(36)

where the mean and the covariance matrix are again given by Eq. 17

. We can then sample from the normal distribution

to obtain, for instance, credible intervals for every component of the solution

. At the same time, we stress that this only allows for uncertainty quantification in for given hyper-parameters and . The above approach does not include uncertainty in and when these are treated as random variables themselves. This might be achieved by employing a computational more expensive sampling approach [bardsley2012mcmc], which we will investigate in future work.

3.4 Relationship to current methodology

We now address the connection between the proposed BCD algorithm and some existing methods.

Iterative alternating sequential algorithm

There are both notable similarities and key distinctions between the proposed BCD algorithm and the iterative alternating sequential (IAS) algorithm, developed in [calvetti2007introduction, calvetti2007gaussian] and further investigated in [calvetti2015hierarchical, calvetti2019hierachical]. Both algorithms estimate the unknown and other involved parameters by alternatingly updating them. However, in contrast to the BCD method, the IAS algorithm assumes that the noise covariance matrix is known, which then allows the restriction to white Gaussian noise ; see [calvetti2019hierachical, Section 2]. Moreover, the IAS algorithm builds upon a conditionally Gaussian prior for which the elements of the diagonal covariance matrix are Gamma distributed, rather than the elements of the diagonal inverse covariance matrix as done here, which does not result in a conjugate hyper-prior. This makes the update steps for and the hyper-parameters of the prior more complicated. Finally, the IAS algorithm solves for the MAP estimate of the posterior, which does not provide uncertainty quantification in the reconstructed solution. By contrast, the proposed BCD method grants access to the solution posterior for fixed hyper-parameters.

Iteratively reweighted least squares

The update steps Eqs. 25, 24, and 23 resulting from Algorithm 1 can be interpreted as an iteratively reweighted least squares (IRLS) algorithm [daubechies2010iteratively]. The idea behind IRLS algorithms is to recover, for instance, a sparse solution by penalizing the components of by weighting them individually and iteratively updating these weights. Indeed, the update steps Eqs. 25, 24, and 23 resemble reweighted Tikhonov-regularization strategies. In this regard, the BCD method provides a solid Bayesian interpretation for commonly used reweighting choices and might be used to tailor these weights to specific statistical assumptions on the underlying problem.

4 Numerical results

The MATLAB code used to generate the numerical tests presented here is open access and can be found at GitHub.444See https://github.com/jglaubitz/generalizedSBL

4.1 Denoising a sparse signal

Consider the sparse nodal values of a signal at equidistant points. All of the values in are zero except at four randomly selected locations, where the values were set to . We are given noisy observations which result from adding i. i. d. zero-mean normal noise with variance to the exact values

. The signal-to-noise ratio (SNR), defined as

with , is .

(a) Signal and noisy observations
(b) Reconstructions by different methods
Figure 2: The sparse signal and noisy observations at equidistant points, and reconstructions by different methods

Fig. 1(a) illustrates the exact values of and the noisy observations . The corresponding data model and regularization operator are

(37)

This simple test case allows us to compare the proposed BCD algorithm with some existing methods, some of which assume itself to be sparse (). Fig. 1(b) provides a comparison of the BCD algorithm with (1) SBL using the evidence approach [tipping2001sparse], (2) the IAS method [calvetti2007introduction, calvetti2007gaussian] solving for the MAP estimate of the posterior, and (3) the alternating direction method of multipliers (ADMM) [boyd2011distributed] solving the deterministic -regularized problem Eq. 2. The free parameters of the IAS algorithm were fine-tuned by hand and chosen as and for ; see [calvetti2019hierachical] for more details on these parameters. The regularization parameter in Eq. 2 was also fine-tuned by hand and set to . Finally, for the proposed BCD algorithm and the evidence approach, we assumed the noise variance to be unknown, which therefore had to be estimated by the method as well. We can see in Fig. 1(b) that for this example all of the SBL-based methods perform similarly. On the other hand, the ADMM yeilds a more regularized reconstruction, which might be explained by the uniform nature of the -regularization term in Eq. 2. This is in contrast to the hierarchical Bayesian model which allows for spatially varying regularization. In this regard we note that there are weighted -regularization methods [candes2008enhancing, chartrand2008iteratively, adcock2019joint] that incorporate spatially varying regularization parameters. While such techniques can improve the resolution near the non-zero values in sparse signals, as well as near the internal edges in images, they are still point estimates and thus do not provide additional uncertainty information. Hence in the current investigation we simply employ the standard ADMM with a fine-tuned non-varying regularization parameter as a reasonable comparison.

4.2 Deconvolution of a piecewise constant signal

We next consider deconvolution of the piecewise constant signal illustrated in Fig. 3. The corresponding data model and regularization operator are respectively given by

(38)

where with () and is obtained by applying the midpoint quadrature to the convolution equation

(39)

We assume a Gaussian convolution kernel of the form

(40)

with blurring parameter . The forward operator thus is

(41)

where is the distance between consecutive grid points. Note that has full rank but quickly becomes ill-conditioned.

(a) Signal and noisy blurred data
(b) Different reconstructions
(c) Normalized covariance parameter
(d) Mean and credible intervals
Figure 3: Deconvolution of a piecewise constant signal from noisy blurred data with i. i. d. zero-mean normal noise with variance

Fig. 2(a) illustrates the true signal as well as the given noisy blurred data at equidistant points. Fig. 2(b) provides the reconstructions using the SBL-based BCD algorithm and the ADMM -regularized inverse problem Eq. 2. The regularization parameter in Eq. 2 was again fine-tuned by hand and chosen as . We do not include any of the existing SBL algorithms considered before (the evidence approach and IAS algorithm) since they cannot be applied to the non-quadratic regularization operator in Eq. 38 without modifying this operator first. Fig. 2(c) illustrates the normalized prior covariance parameters

which are estimated as part of the BCD algorithm. Observe that the values are significantly larger at the locations of the jump discontinuities. This allows the reconstruction to “jump” and highlights the nonuniform character of regularization in the hierarchical Bayesian model suggested in

Section 2. Finally, Fig. 2(d) demonstrates the possibility to quantify uncertainty when using the BCD algorithm by providing the credible intervals of the solution posterior for the final estimates of and . Note that these credible intervals, especially their width, indicate the amount of uncertainty in the reconstruction.

(a) Signal and noisy blurred data
(b) Different reconstructions
(c) Normalized covariance parameter
(d) Mean and credible intervals
Figure 4: Deconvolution of a piecewise constant signal from noisy blurred data with i. i. d. zero-mean normal noise with variance

The results displayed in Fig. 4 are for the same model with the noise variance increased by , to (). The BCD algorithm now yields a less accurate reconstruction, especially between and . This is also reflected in the corresponding normalized prior covariance parameters , which can be found Fig. 3(c). Observe that the second peak around is underestimated and therefore causes the block associated with the region to be drawn towards the subsequent block associated with the region . The increased uncertainty of the reconstruction is indicated by the credible intervals in Fig. 3(d). In particular, we note the increased width of the credible interval in the region .

4.3 Combining different regularization operators

We next demonstrate that generalized SBL allows us to consider combinations of different regularization operators. Consider the signal illustrated in Fig. 4(a), which is piecewise constant on and piecewise linear on . The corresponding data model is the same as before with convolution parameter and i. i. d. zero-mean normal noise with variance (). Fig. 4(b) illustrates the reconstructions obtained by the BCD algorithm using a first- and second-order TV-regualrization operator,

(42)

which promote piecewise constant and piecewise linear solutions, respectively. Observe that neither nor is even square, meaning that both would have to be modified by introducing additional rows to apply a standard SBL approach, which can become increasingly complicated for higher orders and multiple dimensions.

(a) Signal and noisy observations
(b) Reconstructions using different regualrizations
Figure 5: Signal and noisy observations at equidistant points, and reconstructions by different methods

It is evident from Fig. 4(b) that using first-order TV-regularization yields a less accurate reconstruction in , where the signal is piecewise linear,555This well-known artifact of first-order TV-regularization is often called the “staircasing” effect and motivates using higher order TV-regularization, [archibald2016image, stefan2010improved]. while using second-order TV-regularization yields a less accurate reconstruction in , where the signal is piecewise constant. However, generalized SBL and the proposed BCD algorithm allows us to consider the combined regularization operator

(43)

Assuming , the first rows correspond to first-order TV-regularization while the last rows correspond to second-order TV-regularization. The advantage of using this nonstandard regularization operator in the BCD algorithm is demonstrated by the red stars in Fig. 4(b).

4.4 Image deconvolution

(a) Reference image
(b) Noisy blurred image
(c) -regularization by ADMM
(d) SBL by the BCD algorithm
Figure 6: The reference image, the corresponding noisy blurred image, and reconstructions using the ADMM and the BCD algorithm Algorithm 1

We next consider the reference image in Fig. 5(a) and its noisy blurred version in Fig. 5(b). results from by applying the discrete one-dimensional convolution operator Eq. 41 in the two canonical coordinate directions and then adding i. i. d. zero-mean normal noise. The corresponding forward model is or, equivalently,

(44)

after vectorization. Here, denotes the column vectors obtained by stacking the columns of the matrix on top of one another, and . Further, the blurring parameter and noise variance were chosen as and () to make the test case comparable to the one in [bardsley2012mcmc, Section 4.2].

Figs. 5(d) and 5(c) show the reconstructions obtained by the ADMM applied to Eq. 2 and the SBL-based BCD algorithm with an anisotropic second-order TV operator

(45)

The regularization parameter in Eq. 2 was again fine-tuned by hand and set to . The BCD algorithm provides a sharper reconstruction (see Fig. 6) than the ADMM applied to the -regularized inverse problem Eq. 2. Further parameter tuning might increase the accuracy of the reconstruction by the ADMM. By contrast, it is important to stress that the BCD algorithm requires no such exhaustive parameter tuning.

4.5 Noisy and incomplete Fourier data

We next address the reconstruction of images based on noisy and incomplete Fourier data, which is common in applications such as magnetic resonance imaging (MRI) and synthetic aperture radar (SAR). The popular prototype Shepp–Logan phantom test image is displayed in Fig. 6(a).

(a) Reference image
(b) ML/LS estimate
(c) -regularization by ADMM
(d) SBL by the BCD algorithm
Figure 7: (a) The Shepp–Logan phantom test image; (b) the ML/LS estimate, and reconstructions using (c) the ADMM applied to Eq. 2; and (d) the SBL-based BCD algorithm

The indirect data

is given by applying the two-dimensional discrete Fourier transform to the reference image

, removing certain frequencies, and adding noise. Since in this investigation we are assuming , we consider the data model

(46)

with and respectively denoting the real and imaginary part of .666Our technique is not limited to real-valued solutions, and we will consider complex-valued solutions, such as those occurring in SAR, in future work. Further, corresponds to i. i. d. zero-mean normal noise with variance () and , where denotes the one-dimensional discrete Fourier transform with missing frequencies, which we impose to mimic the situation where the system is under-determined and some data must for some reason be discarded. The removed frequencies were determined by sampling logarithmically spaced integers between and . Finally, because the image is piecewise constant, we used first-order TV-regularization.

Fig. 6(b) shows the maximum likelihood (ML) estimate of the image, which is obtained by maximizing the likelihood function . In this case, the ML estimate is the same as the least squares (LS) solution of the linear system Eq. 46. Figs. 6(d) and 6(c) illustrate the reconstructions obtained by applying ADMM to the -regularized inverse problem Eq. 2 and the SBL-based BCD algorithm. The regularization parameter in Eq. 2 was again fine-tuned by hand and chosen as . While the reconstructions in Figs. 6(d) and 6(c) are comparable, it is important to point out that we did not use any prior knowledge about the noise variance or perform any parameter tuning for the BCD algorithm.

4.6 Data fusion

As a final example we consider a data fusion problem to demonstrate the possible advantage of using the generalized noise model discussed in Section 2.1. Recall the piecewise constant signal discussed in Section 4.2, and assume we want to reconstruct the values of this signal at equidistant grid points, denoted by . We are given two sets of data: corresponds to direct observations taken at randomly selected locations with added i. i. d. zero-mean normal noise with variance , and corresponds to blurred observations at randomly selected locations with added i. i. d. zero-mean normal noise with variance . The blurring is again modeled using Eq. 41 with a Gaussian convolution kernel and convolution parameter . Further, a first-order TV-regularization operator is employed to promote a piecewise constant reconstruction.

(a) Noisy data
(b) Noisy blurred data
(c) Combined, i. i. d. assumption
(d) Combined, generalized data model
Figure 8: Data fusion example with incomplete noisy and incomplete noisy blurred data. Top row: Separate reconstructions using the SBL-based BCD algorithm. Bottom row: Combined reconstructions using the SBL-based BCD algorithm with i. i. d. assumption and using a generalized data model.

The separate reconstructions by the SBL-based BCD algorithm can be found in Figs. 7(b) and 7(a). Both reconstructions are of poor quality, which is due to the high noise variance in the case of and to the missing information in the case of . In fact, the reconstruction illustrated in Fig. 7(b) is of reasonable quality except for the region around , where a void of observations causes the reconstruction to miss the jumps at and .

Following Example 1, we now fuse the two data sets by considering the joint data model

(47)

where and are the forward models describing how is mapped to and , respectively. Employing the usual likelihood function Eq. 5 would correspond to assuming that all the components of stacked noise vector are i. i. d., which is not true for this example. The resulting reconstruction by the BCD algorithm can be found in Fig. 7(c). In contrast, utilizing the generalized likelihood function Eq. 7 with

(48)

we can appropriately model that and have different variances. The corresponding reconstruction by the BCD algorithm using this generalized data model is provided in Fig. 7(d). Observe that the reconstruction using the generalized noise model (Fig. 7(d)) is clearly more accurate than the one for the i. i. d. assumption (Fig. 7(c)). This can be explained by noting that the first data set is larger than the second one, containing and observations, respectively. At the same time, the data of the first set is less accurate than of the second one, since the variances are and ( and ), respectively. Hence, when using the usual i. i. d. assumption, the first data set , which is larger but less accurate, more strongly influences the reconstruction than second data set, which is smaller but more accurate. Using the generalized data model, on the other hand, the BCD algorithm is able to more appropriately balance the influence of the different data sets.

5 Concluding remarks

This paper introduced a generalized approach for SBL and an efficient realization of it by the newly proposed BCD algorithm. In contrast to existing SBL methods, our new approach has two major advantages. First, the assumptions usually made regarding the unknown noise are relaxed. Specifically, we require the noise to be independent but not identically distributed. No other information about the covariance matrix is needed. Second, we are able to use any regularization operator as long as the common kernel condition Eq. 18 is satisfied, a usual assumption in regularized inverse problems. Further, the BCD algorithm provides us with the full solution posterior for fixed hyper-parameters rather than just resulting in a point estimate, while being easy to implement and highly efficient. Future work will elaborate on sampling based methods for Bayesian inference [bardsley2012mcmc], which might be computationally more expensive but would also allow sampling from the full joint posterior . Other research directions might include data-informed choices for the parameters and in Eq. 13 and data fusion applications.

Appendix A Evidence approach

In the evidence approach [tipping2001sparse, babacan2010sparse], the posterior is decomposed as

(49)

The variables , , and are then alternatingly updated, with the hyper-parameters and calculated as the mode (most probable value) of the hyper-parameter posterior . By Bayes’ law, one has

(50)

where the evidence can be determined by marginalizing out the unknown solution , which yields

(51)

Some basic but lengthy computations are then used to obtain

(52)

where . Also see [babacan2010sparse, Section 3]. However, this assumes that is invertible, which is not the case whenever .

References