## 1 Introduction

The standard and popular image deconvolution techniques generally assume that the space-invariant instrument response, i.e., the point spread function (PSF), is perfectly known. However, in many practical situations, the true PSF is either unknown or, at best, partially known. For example, in an optical system a perfectly known PSF does not exist because of light diffraction, apparatus/lense aberration, out-of-focus, or image motion (Ward1987, ; Kundur1996, ). Such imperfections are common in general imaging systems including MRFM, where there exist additional model PSF errors in the sensitive magnetic resonance condition (Mamin2003, ). In such circumstances, the PSF required in the reconstruction process is mismatched with the true PSF. The quality of standard image reconstruction techniques may suffer from this disparity. To deal with this mismatch, deconvolution methods have been proposed to estimate the unknown image and the PSF jointly. When prior knowledge of the PSF is available, these methods are usually referred to as semi-blind deconvolution (Makni2005, ; Pillonetto2007, ) or myopic deconvolution (Sarri1998, ; Chenegros2007, ; Park2011, ).

In this paper, we formulate the semi-blind deconvolution task as an estimation problem in a Bayesian setting. Bayesian estimation has the great advantage of offering a flexible framework to solve complex model-based problems. Prior information available on the parameters to be estimated can be efficiently included within the model, leading to an implicit regularization of our ill-posed problem. In addition, the Bayes framework produces posterior estimates of uncertainty, via posterior variance and posterior confidence intervals. Extending our previous work, we propose a variational estimator for the parameters as contrasted to the Monte Carlo approach in

(Park2012, ). This extension is non-trivial. Our variational Bayes algorithm iterates on a hidden variable domain associated with the mixture coefficients. This algorithm is faster, more scalable for equivalent image reconstruction qualities in (Park2012, ).Like in (Park2012, ), the PSF uncertainty is modeled as the deviation of the a priori known PSF
from the true PSF.
Applying an eigendecomposition to the PSF covariance,
the deviation is represented as a linear
combination of orthogonal PSF bases with unknown coefficients that need to be
estimated.
Furthermore, we assume the desired image is sparse,
corresponding to the natural sparsity of the molecular
image. The image prior is a weighted sum of a sparsity
inducing part and a continuous distribution; a positive truncated *Laplacian and atom at zero* (LAZE)
prior^{1}^{1}1A Laplace distribution as a prior distribution acts as a sparse regularization using norm. This can be seen by taking negative logarithm on the distribution. (Ting2009, ). Similar priors have been applied to estimating mixtures of densities
(Bishop2006, ; Nasios2006, ; Corduneanu2001, ) and sparse, nonnegative hyperspectral unmixing (Themelis2012, ). Here we introduce a hidden label variable for the contribution of the discrete mass (empty pixel) and a
continuous density function (non-empty pixel).

Similar to our ‘hybrid’ mixture model, inhomogeneous gamma-Gaussian mixture models have been proposed in

(Makni2008, ).Bayesian inference of parameters from the posterior distribution generally requires challenging computations, such as functional optimization and numerical integration. One widely advocated strategy relies on approximations to the minimum mean square error (MMSE) or maximum a posteriori (MAP) estimators using samples drawn from the posterior distribution. Generation of these samples can be accomplished using Markov chain Monte Carlo methods (MCMC) (Robert2004, ). MCMC has been successfully adopted in numerous imaging problems such as image segmentation, denoising, and deblurring (Gilks1999, ; Robert2004, ). Recently, to solve blind deconvolution, two promising semi-blind MCMC methods have been suggested (Park2012, ; Orieux2010, ). However, these sampling methods have the disadvantage that convergence may be slow.

An alternative to Monte Carlo integration is a variational approximation to the posterior distribution, and this approach is adopted in this paper. These approximations have been extensively exploited to conduct inference in graphical models (Attias00, ). If properly designed, they can produce an analytical posterior distribution from which Bayesian estimators can be efficiently computed. Compared to MCMC, variational methods are of lower computational complexity, since they avoid stochastic simulation. However, variational Bayes (VB) approaches have intrinsic limits; the convergence to the true distribution is not guaranteed, even though the posterior distribution will be asymptotically normal with mean equal to the maximum likelihood estimator under suitable conditions (Walker1969, ). In addition, variational Bayes approximations can be easily implemented for only a limited number of statistical models. For example, this method is difficult to apply when latent variables have distributions that do not belong to the exponential family (e.g. a discrete distribution (Park2012, )). For mixture distributions, variational estimators in Gaussian mixtures and in exponential family converge locally to maximum likelihood estimator (Wang2004a, ; Wang2006, ). The theoretical convergence properties for sparse mixture models, such as our proposed model, are as yet unknown. This has not hindered the application of VB to sparse models to problems in our sparse image mixture model.

Another possible intrinsic limit of the variational Bayes approach, particularly in (semi)-blind deconvolution, is that the posterior covariance structure cannot be effectively estimated nor recovered, unless the true joint distributions have independent individual distributions. This is primarily because VB algorithms are based on minimizing the KL-divergence between the true distribution and the VB approximating distribution, which is assumed to be factorized with respect to the individual parameters.

However, despite these limits, VB approaches have been widely applied with success to many different engineering problems (Bishop2000, ; Ghahramani2000, ; Beal2003, ; Winn2005, ).
A principal contribution of this paper is the development and implementation of a VB algorithm for mixture distributions in a hierarchical Bayesian model.
Similarly, the framework permits a Gaussian prior (Babacan2009, ) or a Student’s-t prior (Tzikas2009, ) for the PSF.
We present comparisons of our variational solution to other blind deconvolution methods.
These include the total variation (TV) prior for the PSF (Amizic2010, ) and
natural sharp edge priors for images with PSF regularization (Almeida2010, ). We also compare to basis kernels (Tzikas2009, ), the mixture model algorithm of Fergus *et al.* (Fergus2006, ),
and the related method of Shan *et al.* (Shan2008, ) under a motion blur model.

To implement variational Bayesian inference, prior distributions and the instrument-dependent likelihood function are specified. Then the posterior distributions are estimated by minimizing the Kullback-Leibler (KL) distance between the model and the empirical distribution. Simulations conducted on synthetic images show that the resulting myopic deconvolution algorithm outperforms previous mismatched non-blind algorithms and competes with the previous MCMC-based semi-blind method (Park2012, ) with lower computational complexity.

We illustrate the proposed method on real data from magnetic resonance force microscopy (MRFM) experiments. MRFM is an emerging molecular imaging modality that has the potential for achieving D atomic scale resolution (Sidles1991, ; Sidles1992, ; Sidles1995, ). Recently, MRFM has successfully demonstrated imaging (Rugar1992, ; Zuger1996, ) of a tobacco mosaic virus (Degen2009, ). The D image reconstruction problem for MRFM experiments was investigated with Wiener filters (Zuger1993, ; Zuger1994, ; Zuger1996, ), iterative least square reconstruction approaches (Chao2004, ; Degen2009, ; Degen2009compl, ), and recently the Bayesian estimation framework (Ting2009, ; Dobigeon2009a, ; Park2011, ; Park2012, ). The drawback of these approaches is that they require prior knowledge on the PSF. However, in many practical situations of MRFM imaging, the exact PSF, i.e., the response of the MRFM tip, is only partially known (Mamin2003, ). The proposed semi-blind reconstruction method accounts for this partial knowledge.

The rest of this paper is organized as follows. Section 2 formulates the imaging deconvolution problem in a hierarchical Bayesian framework. Section 3 covers the variational methodology and our proposed solutions. Section 4 reports simulation results and an application to the real MRFM data. Section 5 discusses our findings and concludes.

## 2 Formulation

### 2.1 Image Model

As in (Park2012, ; Dobigeon2009a, ), the image model is defined as:

(1) |

where is a vectorized measurement, is an vectorized sparse
image to be recovered, is a convolution
operator with the PSF ,
is an equivalent system matrix, and is the measurement
noise vector. In this work, the noise vector is assumed
to be Gaussian^{2}^{2}2

denotes a Gaussian random variable with mean

and covariance matrix ., . The PSF is assumed to be unknown but a nominal PSF estimate is available. The semi-blind deconvolution problem addressed in this paper consists of the joint estimation of and from the noisy measurements and nominal PSF .### 2.2 PSF Basis Expansion

The nominal PSF is assumed to be generated with known parameters (gathered in the vector ) tuned during imaging experiments. However, due to model mismatch and experimental errors, the true PSF may deviate from the nominal PSF . If the generation model for PSFs is complex, direct estimation of a parameter deviation, , is difficult.

We model the PSF (resp. ) as a perturbation about a nominal PSF (resp. ) with basis vectors , , that span a subspace representing possible perturbations . We empirically determined this basis using the following PSF variational eigendecomposition approach. A number of PSFs are generated following the PSF generation model with parameters

randomly drawn according to Gaussian distribution

^{3}

^{3}3

The variances of the Gaussian distributions are carefully tuned so that their standard deviations produce a minimal volume ellipsoid that contains the set of valid PSFs.

centered at the nominal values. Then a standard principal component analysis (PCA) of the residuals

is used to identify principal axes that are associated with the basis vectors . The necessary number of basis vectors,, is determined empirically by detecting a knee at the scree plot. The first few eigenfunctions, corresponding to the first few largest eigenvalues, explain major portion of the observed perturbations.

If there is no PSF generation model, then we can decompose the support region of the true (suspected) PSF to produce an orthonormal basis. The necessary number of the bases is again chosen to explain most support areas that have major portion/energy of the desired PSF. This approach is presented in our experiment with Gaussian PSFs.We use a basis expansion to present as the following linear approximation to ,

(2) |

where the determine the PSF relative to this bases. With this parameterization, the objective of semi-blind deconvolution is to estimate the unknown image, , and the linear expansion coefficients .

### 2.3 Determination of Priors

The priors on the PSF, the image, and the noise are constructed as latent variables in a hierarchical Bayesian model.

#### 2.3.1 Likelihood function

Under the hypothesis that the noise in (1) is white Gaussian, the likelihood function takes the form

(3) |

where denotes the norm .

#### 2.3.2 Image and label priors

To induce sparsity and positivity of the image, we use an image prior consisting of “a mixture of a point mass at zero and a single-sided exponential distribution”

(Ting2009, ; Dobigeon2009a, ; Park2012, ). This prior is a convex combination of an atom at zero and an exponential distribution:(4) |

In (4), is the Dirac delta function,

is the prior probability of a non-zero pixel and

is a single-sided exponential distribution where is a set of positive real numbers and denotes the indicator function on the set :(5) |

A distinctive property of the image prior (4) is that it can be expressed as a latent variable model

(6) |

where the binary variables

are independent and identically distributed and indicate if the pixel is active(7) |

and have the Bernoulli probabilities: .

The prior distribution of pixel value in (4) can be rewritten conditionally upon latent variable

which can be summarized in the following factorized form

(8) |

By assuming each component to be conditionally independent given and , the following conditional prior distribution is obtained for :

(9) |

where .

This factorized form will turn out to be crucial for simplifying the variational Bayes reconstruction algorithm in Section 3.

#### 2.3.3 PSF parameter prior

We assume that the PSF parameters are independent and

is uniformly distributed over intervals

(10) |

These intervals are specified a priori and are associated with error tolerances of the imaging instrument. The joint prior distribution of is therefore:

(11) |

#### 2.3.4 Noise variance prior

A conjugate inverse-Gamma distribution with parameters

and is assumed as the prior distribution for the noise variance (See A.1 for the details of this distribution):(12) |

The parameters and

will be fixed to a number small enough to obtain a vague hyperprior, unless we have good prior knowledge.

### 2.4 Hyperparameter Priors

As reported in (Ting2009, ; Dobigeon2009a, ), the values of
the hyperparameters greatly impact the quality
of the deconvolution. Following the approach in
(Park2012, ), we propose to include them within the
Bayesian model, leading to a second level of hierarchy in the
Bayesian paradigm. This hierarchical Bayesian model requires the
definition of prior distributions for these hyperparameters, also
referred to as *hyperpriors* which are defined below.

#### 2.4.1 Hyperparameter

A conjugate inverse-Gamma distribution is assumed for the Laplacian scale parameter :

(13) |

with . The parameters and will be fixed to a number small enough to obtain a vague hyperprior, unless we have good prior knowledge.

#### 2.4.2 Hyperparameter

We assume a Beta random variable with parameters , which are iteratively updated in accordance with data fidelity. The parameter values will reflect the degree of prior knowledge and we set to obtain a non-informative prior. (See A.2 for the details of this distribution)

(14) |

### 2.5 Posterior Distribution

The conditional relationships between variables is illustrated in Fig. 1. The resulting posterior of hidden variables given the observation is

(15) |

Since it is too complex to derive exact Bayesian estimators from this posterior, a variational approximation of this distribution is proposed in the next section.

## 3 Variational Approximation

### 3.1 Basics of Variational Inference

In this section, we show how to approximate the posterior densities within a variational Bayes framework. Denote by the set of all hidden parameter variables including the image variable in the model, denoted by . The hierarchical model implies the Markov representation . Our objective is to compute the posterior , where is a set of variables in except . Let be any arbitrary distribution of . Then

(16) |

with

(17) | ||||

(18) |

We observe that maximizing the lower bound is equivalent to minimizing the Kullback-Leibler (KL) divergence . Consequently, instead of directly evaluating given , we will specify a distribution that approximates the posterior . The best approximation maximizes . We present Algorithm 1 that iteratively increases the value of by updating posterior surrogate densities. To obtain a tractable approximating distribution , we will assume a factorized form as where has been partitioned into disjoint groups . Subject to this factorization constraint, the optimal distribution is given by

(19) |

where denotes the expectation^{4}^{4}4In the
sequel, we use both and
to denote the expectation. To make our expressions more compact, we use subscripts to denote expectation with respect to the random variables
in the subscripts.
These notations with the subscripts of ‘’ denote expectation
with respect to all random variables except for the variable . e.g.
with respect to all factors
except .
We will call the posterior surrogate for .

### 3.2 Suggested Factorization

Based on our assumptions on the image and hidden parameters, the random vector is with and . We propose the following factorized approximating distribution

(20) |

Ignoring constants^{5}^{5}5In the sequel, constant terms with respect to the variables of interest can be omitted in equations., (19) leads to

(21) |

which induces the factorization

(22) |

Similarly, the factorized distribution for , and is

(23) |

leading to the fully factorized distribution

(24) |

### 3.3 Approximating Distribution

In this section, we specify the marginal distributions in the approximated posterior distribution required in (24). More details are described in B. The parameters for the posterior distributions are evaluated iteratively due to the mutual dependence of the parameters in the distributions for the hidden variables, as illustrated in Algorithm 1.

#### 3.3.1 Posterior surrogate for

(25) |

with .

#### 3.3.2 Posterior surrogate for

(26) |

with .

#### 3.3.3 Posterior surrogate for

#### 3.3.4 Posterior surrogate for

We first note that

(28) |

The conditional density of given is , where . Therefore, the conditional posterior surrogate for is

(29) | ||||

(30) |

where is a positively truncated-Gaussian density function with the hidden mean and variance , , , , is except for the th entry replaced with 0, and is the th column of . Therefore,

(31) |

which is a Bernoulli truncated-Gaussian density.

#### 3.3.5 Posterior surrogate for

For ,

(32) |

with . is the digamma function and .

#### 3.3.6 Posterior surrogate for

For ,

(33) |

where

is the probability density function for the normal distribution with the mean

and variance , , and .The final iterative algorithm is presented in
Algorithm 1, where required *shaping*
parameters under distributional assumptions and related statistics
are iteratively updated.

## 4 Simulation Results

We first present numerical results obtained for Gaussian and typical MRFM PSFs, shown in Fig. 2 and Fig. 6, respectively. Then the proposed variational algorithm is applied to a tobacco virus MRFM data set. There are many possible approaches to selecting hyperparameters, including the non-informative approach of (Park2012, )

and the expectation-maximization approach of

(Nasios2006, ). In our experiments, hyper-parameters , , , and for the densities are chosen based on the framework advocated in (Park2012, ). This leads to the vague priors corresponding to selecting small values . For , the noninformative initialization is made by setting , which gives flexibility to the surrogate posterior density for. The resulting prior Beta distribution for

is a uniform distribution on for the mean proportion of non-zero pixels.(34) |

The initial image used to initialize the algorithm is obtained from one Landweber iteration (Landweber1951, ).

### 4.1 Simulation with Gaussian PSF

The true image used to generate the data, observation , the true PSF, and the initial, mismatched PSF are shown in Fig. 2. Some quantities of interest, computed from the outputs of the variational algorithm are depicted as functions of the iteration number in Fig. 3. These plots indicate that convergence to the steady state is achieved after few iterations. In Fig. 3, and get close to the true level but shows a deviation from the true values. This large deviation implies that our estimation of noise level is conservative; the estimated noise level is larger than the true level. This relates to the large deviation in projection error from noise level (Fig. 3(a)). The drastic changes in the initial steps seen in the curves of are due to the imperfect prior knowledge (initialization). The final estimated PSF and reconstructed image are depicted in Fig. 4

, along with the reconstructed variances and posterior probability of

. We decomposed the support region of the true PSF to produce orthonormal bases shown in Fig. 5. We extracted 4 bases because these four PSF bases clearly explain the significant part of the true Gaussian PSF. In other words, little energy resides outside of this basis set in PSF space.The reconstructed PSF clearly matches the true one, as seen in Fig. 2 and Fig. 4. Note that the restored image is slightly attenuated while the restored PSF is amplified because of intrinsic scale ambiguity.

### 4.2 Simulation with MRFM type PSFs

The true image used to generate the data, observation , the true PSF, and the initial, mismatched PSF are shown in Fig. 6. The PSF models the PSF of the MRFM instrument, derived by Mamin et al. (Mamin2003, ). The convergence of the algorithm is achieved after the 10th iteration. The reconstructed image can be compared to the true image in Fig. 7, where the pixel-wise variances and posterior probability of are rendered. The PSF bases are obtained by the procedure proposed in Section 2.2 with the simplified MRFM PSF model and the nominal parameter values (Ting2009, ). Specifically, by detecting a knee at the scree plot, explaining more than 98.69% of the observed perturbations (Fig. 3 in (Park2012, )), we use the first four eigenfunctions, corresponding to the first four largest eigenvalues. The resulting principal basis vectors are depicted in Fig. 8. The reconstructed PSF with the bases clearly matches the true one, as seen in Fig. 6 and Fig. 7.

### 4.3 Comparison with PSF-mismatched reconstruction

The results from the variational deconvolution algorithm with a mismatched Gaussian PSF and a MRFM type PSF are presented in Fig. 9 and Fig. 10, respectively; the relevant PSFs and observations are presented in Fig. 2 in Section 4.1 and in Fig. 6 in Section 4.2, respectively. Compared with the results of our VB semi-blind algorithm (Algorithm 1), shown in Fig. 4 and Fig. 7, the reconstructed images from the mismatched non-blind VB algorithm in Fig. 9 and Fig. 10, respectively, inaccurately estimate signal locations and blur most of the non-zero values.

Additional experiments (not shown here) establish that the PSF estimator is very accurate when the algorithm is initialized with the true image.

### 4.4 Comparison with other algorithms

To quantify the comparison, we performed experiments with the same set of four sparse images and the MRFM type PSFs as used in (Park2012, ). By generating 100 different noise realizations for 100 independent trials with each true image, we measured errors according to various criteria. We tested four sparse images with sparsity levels .

Under these
criteria^{6}^{6}6
Note that the norm has been normalized.
The true image has value 1; is used for
MCMC method; for variational method
since this method does not produce zero pixels but .

Note also that, for our simulated data, the (normalized) true noise
levels are for , respectively.,
Fig. 11 visualizes the reconstruction error performance
for several measures of error. From these figures we conclude that
the VB semi-blind algorithm performs at least as well as the previous MCMC semi-blind algorithm.
In addition, the VB method outperforms AM (Herrity2008, ) and the mismatched non-blind MCMC (Dobigeon2009a, ) methods.
In
terms of PSF estimation, for very sparse images the VB semi-blind method seems to outperform the MCMC method.
Also, the proposed VB semi-blind method
converges more quickly and requires fewer iterations.
For example, the VB semi-blind algorithm converges in approximately 9.6
seconds after 12 iterations, but the previous MCMC algorithm takes
more than 19.2 seconds after 40 iterations to achieve convergence^{7}^{7}7
The convergence here is defined as the state where the change in estimation curves over time is negligible..

In addition, we made comparisons
between our sparse image reconstruction method and other state-of-the-art blind deconvolution methods
(Babacan2009, ; Amizic2010, ; Almeida2010, ; Tzikas2009, ; Fergus2006, ; Shan2008, ), as shown in our previous work (Park2012, ). These algorithms were
initialized with the nominal, mismatched PSF and were applied
to the same sparse image as our experiment above.
For a fair comparison, we made a sparse prior modification in the image model of other algorithms, as needed. Most of these methods do not assume or fit into the sparse model in our experiments, thus leading to poor performance in terms of image and PSF estimation errors.
Among these tested algorithms, two of them, proposed by Tzikas *et al.* (Tzikas2009, ) and Almeida *et al.* (Almeida2010, ), produced non-trivial and convergent solutions and the corresponding results are compared to ours in Fig. 11.
By using basis kernels the method proposed by Tzikas *et al.* (Tzikas2009, ) uses a similar PSF model to ours. Because a sparse image prior is not assumed in their algorithm (Tzikas2009, ), we applied their suggested PSF model along with our sparse image prior for a fair comparison.
The method proposed by Almeida *et al.* (Almeida2010, ) exploits the sharp edge property in natural images and uses initial, high regularization for effective PSF estimation.
Both of these perform worse than our VB method as seen in Fig. 11. The remaining algorithms (Babacan2009, ; Amizic2010, ; Fergus2006, ; Shan2008, ), which focus on photo image reconstruction or motion blur, either produce a trivial solution () or are a special case of Tzikas’s model (Tzikas2009, ).

To show lower bound our myopic reconstruction
algorithm, we used the Iterative Shrinkage/Thresholding (IST) algorithm with a true PSF. This algorithm effectively restores sparse images with a sparsity constraint (Daubechies2004, ).
We demonstrate comparisons of the computation time^{8}^{8}8Matlab is used under Windows 7 Enterprise and HP-Z200 (Quad 2.66 GHz)
platform. of our proposed reconstruction algorithm to that of others in Table 1.

Our method | 9.58 |
---|---|

semi-blind MC (Park2012, ) | 19.20 |

Bayesian nonblind (Dobigeon2009a, ) | 3.61 |

AM (Herrity2008, ) | 0.40 |

Almeida’s method (Almeida2010, ) | 5.63 |

Amizic’s method (Amizic2010, ) | 5.69 |

Tzikas’s method (Tzikas2009, ) | 20.31 |

(oracle) IST (Daubechies2004, ) | 0.09 |

### 4.5 Application to tobacco mosaic virus (TMV) data

We applied the proposed variational semi-blind sparse deconvolution algorithm to the tobacco mosaic virus data, made available by our IBM collaborators (Degen2009, ), shown in the first row in Fig. 12. Our algorithm is easily modifiable to these 3D raw image data and 3D PSF with an additional dimension in dealing with basis functions to evaluate each voxel value . The noise is assumed Gaussian (Rugar1992, ; Degen2009, ) and the four PSF bases are obtained by the procedure proposed in 2.2 with the physical MRFM PSF model and the nominal parameter values (Mamin2003, ). The reconstruction of the 6th layer is shown in Fig. 12(b), and is consistent with the results obtained by other methods. (see (Park2012, ; Dobigeon2009a, ).) The estimated deviation in PSF is small, as predicted in (Park2012, ).

While they now exhibit similar smoothness, the VB and MCMC images are still somewhat different since each algorithm follows different iterative trajectory in the high dimensional space of 3D images, thus converging possibly to slightly different stopping points near the maximum of the surrogate distribution. We conclude that the two images from VB and MCMC are comparable in that both represent the 2D SEM image well, but VB is significantly faster.

### 4.6 Discussion

In blind deconvolution, joint identifiability is a common issue. For example,
because of scale ambiguity, the unicity cannot be guaranteed in a general setting.
It is not proven in our solution either. However, the shift/time ambiguity issue noticed in (Ge2011, ) is implicitly addressed in our method using a nominal and basis PSFs. Moreover, our constraint on the PSF space using a basis approach effectively excludes a delta function as a PSF solution, thus avoiding the trivial solution. Secondly, the PSF solution is restricted
to this linear spanning space, starting form the initial, mismatched PSF.
We can, therefore, reasonably expect that the solution provided
by the algorithm is close to the true PSF, away from the trivial solution or the initial PSF.

To resolve scale ambiguity in a MCMC Bayesian framework, stochastic samplers are proposed in (Ge2011, ) by imposing a fixed variance on a certain distribution^{9}^{9}9We note that this MCMC method designed for 1D signal deconvolution
is not efficient for analyzing 2D and 3D images, since the grouped and marginalized samplers are usually slow to converge requiring hundreds of iterations (Ge2011, ). .
Another approach to resolve the scale ambiguity is to assume a hidden scale variable that is multiplied to the PSF and dividing the image (or vice versa.), where the scale is drawn along each iteration of the Gibbs sampler (Vincent2010, ).

## 5 Conclusion

We suggested a novel variational solution to a semi-blind sparse deconvolution problem. Our method uses Bayesian inference for image and PSF restoration with a sparsity-inducing image prior via the variational Bayes approximation. Its power in automatically producing all required parameter values from the data merits further attention for the extraction of image properties and retrieval of necessary features.

From the simulation results, we conclude that the performance of the VB method competes with MCMC methods in sparse image estimation, while requiring fewer computations. Compared to a non-blind algorithm whose mismatched PSF leads to imprecise and blurred signal locations in the restored image, the VB semi-blind algorithm correctly produces sparse image estimates. The benefits of this solution compared to the previous solution (Park2012, ) are faster convergence and stability of the method.

## Appendix A Useful Distributions

### a.1 Inverse Gamma Distribution

The density of an inverse Gamma random variable is , for . and .

### a.2 Beta Distribution

The density of a Beta random variable is , for , with . The mean of is and , where is a digamma function.

### a.3 Positively Truncated Gaussian Distribution

The density of a truncated Gaussian random variable is denoted by , and its statistics used in the paper are

where

is a cumulative distribution function for the standard normal distribution.

## Appendix B Derivations of

In this section, we derive the posterior densities defined by variational Bayes framework in Section 3.

### b.1 Derivation of

We denote the expected value of the squared residual term by . For ,

where is the convolution matrix corresponding to the convolution with . For and , , since . Here, is approximated as a diagonal matrix . This is reasonable, especially when the expected recovered signal exhibits high sparsity. Likewise, and .

Then, we factorize , with , .

If we set the prior, , to be a uniform distribution over a wide range of the real line that covers error tolerances, we obtain a normally distributed variational density with its mean and variance defined above, because . By the independence assumption, , so can be easily evaluated.

### b.2 Derivation of

We evaluate ignoring edge effects; . is a kernel energy in sense and the variance terms add uncertainty, due to the uncertainty in , to the estimation of density. Applying (19), (ignoring constants)

( denotes expectation with respect to all variables except .)

### b.3 Derivation of

For , with , (th column of ). Ignoring constants, .

Using the orthogonality of the kernel bases and uncorrelatedness of ’s, we derive the following terms (necessary to evaluate ): and,