1 Introduction
Image reconstruction involves constructing interpretable images of objects of interest from the data recorded by an imaging device [16]. Image reconstruction is usually cast as an inverse problem as one wants to determine the input to a system from the output of it. In most practical image reconstruction problems, the measurement and recording process is inevitably corrupted by noise, which renders the obtained data random. The statistical properties of the data have significant impact to the reconstruction results. In this work we shall focus on a special type of medical image reconstruction problems where the recorded data follows a Poisson distribution. The Poisson data usually arises in imaging problems where the unknown quantity of interest is an object which interacts with some known incident beam of photons or electrons [21]. A very important example of such problems is the Positron emission tomography(PET) [27, 3], a nuclear medicine imaging technique that is widely used in early detection and treatment follow up of many diseases, including cancer. In PET, the detection of signal is essential a photon counting process and as a result the data is well modeled by a Poisson distribution [3, 21]. The problem has attracted considerable research interests, and a number of methods have been developed to recover the image, e.g., [32, 35, 15], just to name a few.
On the other hand, the stochastic nature of the data also introduces uncertainty into the image reconstruction process, and as a result the image obtained is unavoidably subject to uncertainty. In practice, many important decisions such as diagnostics have to be made based on the images obtained. It is thus highly desirable to have methods that can not only compute the image but also quantify the uncertainty in the image obtained. To this end, the Bayesian inference method has become a popular tool for image reconstruction [23], largely thanks to its ability to quantify uncertainty in the obtained image. The Bayesian formulation has long been used to solve image reconstruction problems with Poisson data, e.g., [20, 25, 19]
. We note, however, that most of the works in the early years focus on computing a point estimate, which is usually the maximum a posteriori (MAP) estimate in the Bayesian setting, because of the limited computational power available then. More recently, mounting interest has been directed to the computation of the complete posterior distribution, rather than a point estimate, of the image, for that it can provide the important uncertainty information of the reconstruction results. For example, a Markov chain Monte Carlo (MCMC) algorithm is developed to sample the posterior distribution of the image in
[5], and a variational Gaussian approximation of the posterior is proposed in [2].A serious challenge in the numerical implementation of the Bayesian image reconstruction is that in certain circumstances the inference results diverge with respect to resolution/discretization refinement, which is known as to be discretization variant or dimension dependent. To address the issue, Stuart [33] proposes an infinite dimensional framework, formulating the Bayesian inference problem in function spaces. Building on several existing works, we aim to provide in this work a complete framework for performing infinite dimensional Bayesian inference and uncertainty quantification for medical image reconstruction with Poisson data, while providing treatments of several issues surrounding the problem. Specifically we summarize the key ingredients of our Bayesian framework as the following. First, in the usual setup, the function of interest can be both positive and negative valued. However, in the Poisson problem, when the function is negative valued, it may cause the Poisson likelihood function to be undefined (see Section 2.3 for more details), which renders the posterior distribution illposed in the infinite dimensional setting. To tackle the issue, we introduce a reparametrization of the unknown image which ensures that the function of interest is always positive valued. Moreover, medical images are often subject to sharp jumps and here we use the TVGaussian (TG) hybrid prior distribution proposed in [36] to model the jumps in the function/image. Using the positivitypreserving reparametrization and the TG prior, we are able to show that the resulting posterior distribution is wellposed in the infinite dimensional setting, which, to the best of our knowledge, has not yet been done for the Poisson data model. Second, we consider the numerical implementation of the Bayesian inference. A main difficulty here is that many standard MCMC algorithms such as the wellknown MetropolisHastings (MH) [31, 8], degenerate with respect to resolution refinement. In [13] the authors introduce a MCMC algorithm termed as the preconditioned CrankNicolson (pCN) method, the performance of which is independent of discretization dimensionality. The authors also provide a Langevin variant of the pCN algorithm in [13] which accelerates the sampling procedure by incorporating the local gradient information of the likelihood function. In our problem, the pCNLangevin (pCNL) algorithm can not be used directly because the prior used here has the total variation (TV) term which can be nondifferentiable. To overcome this difficulty we modify the pCNL method by replacing the gradient direction with one computed by the primaldual algorithm. We note that a similar problem is considered in [29, 14] where a proximal method is used to approximate the gradient direction. Other than that the directions are computed with different approaches, another main difference between the aforementioned works and the present one is that we use the pCN framework here so the algorithm is dimension independent, while the works [29, 14]
concern finite dimensional problems where discretization refinement is not an issue. Third, an important issue in the TG hybrid prior is to determine the value of the regularization parameter of the TV term. In the Bayesian framework, such parameters are often determined with the hierarchical Bayes or the empirical Bayes method
[18]. As discussed in Section 4, these methods, however, are computationally intractable in our problem as we do not know the normalization constant of the TG prior. Thus, in this work we provide a method to determine the value of the TV regularization parameter based on the realized discrepancy model fit assessment approach developed in [17]. Finally, we provide an application of the uncertainty information obtained in the Bayesian framework, where we use the posterior distribution to detect possible artifacts in any reconstructed image.The rest of the paper is organized as the following. In Section 2, we present the infinite dimensional Bayesian formulation of the image reconstruction problem with Poisson data, and we prove that under the reparametrization the resulting posterior is wellposed in the function space. In Section 3, we describe the primaldual pCN algorithm to sample the posterior distribution of the present problem. Section 4 provides a method to determine the value of the regularization parameter in the TG prior. Section 5 discusses how to use the posterior distribution to detect artifacts in a reconstructed image. Finally numerical experiments of the proposed Bayesian framework are performed in Section 6.
2 Infinite dimensional Bayesian image reconstruction with Poisson data
In this section, we formulate the image reconstruction with Poisson data in an infinite dimensional Bayesian framework.
2.1 The Bayesian inference formulation for functions
We start by presenting a generic Bayesian inference problem for functions. Let be a separable Hilbert space of functions with inner product . Our goal is to infer from data and is related to via the likelihood function , i.e., the distribution of conditional on the value of . In the Bayesian setting, we first assume a prior distribution of the unknown , which represents one’s prior knowledge on the unknown. In principle can be any probabilistic measure defined on the space . The posterior measure of conditional on data is provided by the RadonNikodym(RN) derivative:
(1) 
which can be interpreted as the Bayes’ rule in the infinite dimensional setting. The posterior distribution thus can be computed from Eq. (1) with, for example, a MCMC simulation.
2.2 Poisson data model and the positivitypreserving reparametrization
To perform the Bayesian inference, we first need to specify the likelihood function, which can be derived from the underlying mathematical model relating the data and to the unknown image. We assume that the image is first projected to the noisefree observable via a mapping ,
(2) 
While noting that the proposed framework is rather general, here for simplicity we restrict ourselves in the case where
is a bounded linear transform. For example in the PET imaging problems, the mapping
is the Radon transform, where each is computed by integrating alone a line :(3) 
for , where is a positive constant describing the noise level. The Radon transform is a bounded linear transform [26]. Poisson noise is then applied to the projected observable , yielding the likelihood function , where is the dimensional Poisson distribution:
(4) 
In the PET problem, there is an additional restriction: the unknown function must be positive. The reason is twofold: first from the physical point of view, the unknown represents the density of the medium, which is positive; from a technical point of view, if is not constrained to be positive, it may yield some negative components of the predicted data , which renders the Poisson likelihood undefined. To this end, we need to introduce a transformation to preserve positivity of the unknown . To impose the positivity constraint, we reparameterize the unknown as:
where and are two constants satisfying and , and is the error function defined as:
With the new parametrization, it is easy to see that for any , we have
(5) 
Now we can infer the new unknown and once is known can be computed accordingly. In this setup, the likelihood function for becomes
where is the dimensional Poisson distribution given by Eq. (4). Following [33], we can write the likelihood function in the form of equationparentequation
(6a)  
where  
(6b) 
For simplicity we can rewrite Eq. (6b) as,
(7) 
where is a
dimensional vector whose components are all one,
is the predicted observable, and denotes the Euclidean inner product. In what follows we often omit the argument and simply use , when not causing ambiguity. This notation will be used often later.2.3 Bayesian framework with the hybrid prior for PET imaging
We now describe how the infinite dimensional Bayesian inference framework is applied to the PET problem. First we assume that the unknown function is a function defined on , a bounded open subset of . In particular, we set the state space to be the Sobolev space :
The associated norm is
Choosing a good prior distribution is one of the most important issues in Bayesian inference. Conventionally one often assumes that the prior on , is a Gaussian measure defined on with mean covariance operator , i.e. . Note that is symmetric positive and of trace class. The Gaussian prior has many theoretical and practical advantages, but a major limitation of the Gaussian prior is that it can not model functions with sharp jumps well.
To address the issue, here we use the TVGaussian prior proposed in [36]:
(8) 
where is the TV seminorm,
(9) 
and is a positive constant. It follows immediately that the RN derivative of with respect to is
(10) 
which returns to the conventional formulation of inference with a Gaussian prior. Thus all the methods developed for inference problems with Gaussian priors can be directly applied to our formulation.
Next we shall show that the formulated Bayesian inference problem is well defined in the infinite dimensional setting. We first show that given by Eq. (7) satisfies certain important conditions, as is stated by Proposition 1.
Proposition 1
The functional given in Eq. (7) has the following properties:

For every , there are constants and such that, for all , and with ,

For every there is a constant such that, for all with
, 
There exists a constant such that for any , we have
A detailed proof of the proposition is provided in Appendix A. Following Proposition 1, we can conclude that the hybrid prior (8), and the loglikelihood function Eq. (7) yield a wellbehaved posterior measure given by Eq. (10) in the infinitedimensional setting, as is summarized in the following theorem:
Theorem 1
For given by Eq. (7) and prior measure given by Eq. (8), we have the following results:

given by Eq. (10) is Lipschitz in the data , with respect to the Hellinger distance: if and are two measures corresponding to data and the there exists such that, for all with ,

Let
(11) where is a dimensional approximation of and is a dimensional approximation of . Assume that satisfies the three properties of Proposition 1 with constants uniform in , and satisfy Assumptions A.2 (i) and (ii) in [36] with constants uniform in . Assume also that for , there exist two positive sequences and converging to zero, such that for , where
Then we have
Theorem 1 is a direct consequence of Proposition 1, and the proof of the theorem can be found in [36] and is omitted here.
Finally, we note that, in the numerical implementations, we use the truncated KarhunenLove (KL) expansion [24] to represent the unknown . Namely, we write as
(12) 
where
are the eigenvalueeigenfunction pair of the covariance operator
, andare independent with each following a standard normal distribution. In the KL representation, the number of KL modes (eigenfunctions)
corresponds to the discretization dimensionality.3 The primaldual preconditioned CrankNicolson MCMC algorithm
In most practical image reconstruction problems, the posterior distribution can not be analytically calculated. Instead, one usually represent the posterior by samples drawn from it using MCMC algorithms. It is demonstrated in [13] that standard MCMC algorithms may become problematic in the infinite dimensional setting: its acceptance probability will generate to zero as the discretization dimensionality increases. Here we adopt the pCN MCMC algorithm particularly developed for the infinite dimensional problems [13]. An important feature of the pCN MCMC algorithm is that its sampling efficiency is independent of discretization dimensionality up to the numerical errors in the evaluation of the functionals and , which makes it particular useful for sampling the posterior distribution defined in function spaces. We start with a brief introduction of the pCN algorithm following the presentation of [13]. We denote of Eq. (10) as
. Simply speaking the algorithms are derived by applying the CrankNicolson (CN) discretization to a stochastic partial differential equation whose invariant distribution is the posterior. We here omit the derivation details while referring interested readers to
[13], and jump directly to the pCN proposal:(13) 
where and are the present and the proposed positions respectively, and is the parameter controlling the stepsize of the algorithm. The proposed sample is then accepted or rejected according to the acceptance probability:
(14) 
which is independent of discretization dimensionality up to numerical errors.
The pCN proposal in Eq. (13) can be improved by incorporating the data information in the proposal, and following the idea of Langevin MCMC for the finite dimensional problems, one can derive the preconditioned CrankNicolson Langevin (pCNL) proposal:
(15) 
where , and is the gradient operator. If we define as following:
(16) 
then the acceptance probability is given by:
(17) 
The pCNL algorithm is usually more efficient than the standard pCN algorithm as it takes advantage of the gradient information of the . However, the pCNL algorithm can not be used directly in our problem as includes the TV term which is not differentiable. It is important to note that is the offset term only affecting the mean of the proposal, and we can replace with another direction , yielding proposal:
(18) 
where and . And it can be verified that the detailed balance is maintained if we change Eq. (16) to
(19) 
while the acceptance probability is given by Eq. (17). Now we need to find a good direction . In [29] the authors use Moreau approximation to approximate the TV term in the Langevin MCMC algorithm. Here we shall provide an alternative approach, determining the offset direction in the MCMC iteration using the primal dual algorithm. The primaldual algorithms are known to be very effective in solving optimization problems involving TV regularization [11, 12, 10], and we hereby give a brief description of the primal dual method applied to our problem. Suppose that we want to solve
(20) 
Introducing a new variable with (we denote this as ), and we then rewrite the optimization problem (20) as
(21)  
where . The augmented Lagrangian for Eq.(21) is
(22) 
where is the dual variable or Lagrange multiplier, and is a constant called the penalty parameter. The resulting problem is then solved with the Alternating Direction Method of Multipliers (ADMM) [7]: equationparentequation
(23a)  
(23b)  
(23c) 
The algorithm consists of a minimization step (23a), a –minimization step (23b), and a dual ascent step (23c). When applied to the MCMC algorithm, the iterations proceed as described in in Algorithm 1. It should be noted here that, in Step 3 of the algorithm, it is required to solve a minimization problem. However, for efficiency’s sake, we choose not to solve this optimization accurately. In the numerical implementations in this work, we only perform one gradient descent iteration, and so the computational cost for each MCMC iteration does not increase much compared to standard pCN.
4 Determining the TV regularization parameter
Just like the deterministic inverse problems, it is an important issue to determine the TV regularization parameter in the hybrid prior. In the Bayesian setting, standard methods such as the empirical Bayes (EB) approach [18] can be too computationally intensive in the present problem, especially as we have no knowledge of the normalization constant of the hybrid prior. Namely, recall that the EB method seeks to maximize
where is not known in advance and needs to be evaluated with another Monte Carlo integration. Here we provide a statistical approach to determine the value of , which is derived from the realized discrepancy method for model assessment proposed in [17].
The basic idea of the method is to choose a function that measures the discrepancy between the measured data and the projected observable , and for the present problem we use the discrepancy,
(24) 
Now knowing that , we can use this discrepancy to assess how well a specific choice of fits the data. The classical value based on the discrepancy is
(25) 
where is the simulated data from model (4). In particular, for the discrepancy function given in Eq. (24), the value is simply,
(26) 
where
is the cumulative distribution function of the
distribution with the degree of freedom
. The classic value computed this way provides an assessment of how well a single estimate of fits the data . The method can be extended to the Bayesian setting to assess the fitness of the posterior distribution to data. First recall that our prior distribution given by Eq. (8) is specified by the parameter , and as a result the posterior also depends on and here we write the posterior as to emphasize its dependence on. In the Bayesian setting, one can compute the posterior predictive
value:(27) 
which is essentially the classical value averaged over the posterior distribution. The posterior predictive value assesses the fitness of the posterior distribution to the data: intuitively speaking, larger value of indicates better fitness of the posterior to the data . However, one can not simply choose the value of that yields the largest value of , or, equivalently the best fitness to the data, as that may cause overfitting. In other words, if the posterior fits the data “too well”, it often implies that the effect of the prior distribution is so weak that the posterior is dominated by the data. In the image reconstruction problem, this situation is greatly undesirable, as the problem is highly illposed and we need significant contribution from the prior distribution to obtain good estimates of the unknown . In this respect, we should choose in a way that the effects of the prior and data are well balanced, which should be indicated by an appropriate value of . The ideal value of is certainly problem dependent, and in the present problem we suggest to choose so that the resulting value of is in the range of . It is worth mentioning that methods using the data discrepancy to determine regularization parameters for Poisson data model have also been developed in the deterministic setting, and we refer to [4, 6] for further details. Finally we also note that, in addition to
, the Gaussian distribution may also be subject to hyperparameters, and in principle these hyperparameters can be determined along with
using the proposed approach. However, here we choose not to do so for two reasons: first determining multiple parameters may significantly increase the computational cost; second, as the Gaussian distribution is merely used as a reference measure in our hybrid prior, the posterior distribution is not sensitive to it, and it usually suffices to choose these hyperparameters based upon certain prior information (for example, historical data).5 Artifact detection using the posterior distribution
In practical image reconstruction problems, due to the imperfection of methods or devices, a reconstructed image may contain artifacts what are not present in the original imaged object. In this section we describe an application of the posterior distribution to detect artifacts in a reconstructed image.
Specifically, we consider the posterior distribution of the image at any given point , which is denoted as . Consequently we can write the posterior distribution of as . Next we consider the highest posterior density interval (HPDI) which is essentially the narrowest interval corresponding to a given confidence level. More precisely, for an , the HPDI is defined as (cite)
where is the largest constant satisfying . Now suppose that we have a reconstruct image and we also write its value at as . Next we shall estimate how large the credible level must be so that the associated HDPI may contain . That is, we compute the smallest value of such that,
Intuitively speaking, the larger the computed credible level is, the more likely the considered is an artifact. And we thus use the credible level to measure how likely a point is an artifact, and we can do this test for any point . Alternatively, the problem can also be formulated as a Bayesian hypothesis test with a fixed (e.g. ) [28]: that is, is regarded to be an artifact if it is not contained in the HPDI for the prescribed value of . However, it has been pointed out in [34] that performing hypothesis test with HPDI may cause certain theoretical issue and so here we choose not to use the hypothesis test formulation.
It should be noted here that, in [30, 14], the authors utilize the highest posterior density (HDP) region to test if a candidate image is likely to be a solution to the reconstruction problem. The purpose of the present work differs from the aforementioned ones in that we want to identify regions or pixels which are unlikely to be present in the original image, rather than to assess the entire image.
6 Numerical results
In this section we demonstrate the performance of the proposed Bayesian framework, by applying it to a PET image reconstruction problem with synthetic data. In particular the ground truth image (Fig. 1, left) is chosen from the Harvard whole brain atlas [1]. We let and set the image size to be . In the Radon transform we use 60 projections equilaterally sampled from 0 to . The test data, shown in the right figure of Fig. 1, is randomly simulated by plugging the true image into the Radon transform and the Poisson distribution (4) where the noise level is taken to be 1. In the Bayesian inference, we use the hybrid prior distribution where the Gaussian part is taken to be zero mean and covariance:
(28) 
with and . The regularization parameter is chosen to be (it is determined by using the proposed realized discrepancy method and details will be discussed later).
6.1 Convergence with respect to discretization dimensionality
In the numerical implementation we represent the unknown using the truncated KL expansion with KLmodes. First we shall demonstrate that the posterior distributions converges with respect to the discretization dimensionality . To do so, we perform the proposed PDpCN MCMC simulation and compute the posterior means with six different values of : for . We note here that, in all the MCMC simulations performed in this section, we fix the number of samples to be with additional samples used in the burnin step, and also, in all the simulations the stepsize has been chosen in a way that the resulting acceptance probability is around . We then compute the norm of the difference between the posterior mean with and that with for each :
(29) 
where is the posterior mean of computed with KL modes. We plot the difference against the discretization dimensionality in Fig. 2 (left). One can see from the figure that the difference decrease as increase and the difference becomes approximately zero for and , indicating the convergence of the posterior mean with respect to . For each posterior mean , we also compute its peak signaltonoise ratio (PSNR) [22], a commonly used metric of the quality of a reconstructed image. We show the PSNR results in Fig. 2 (right), and the figure shows that the PSNR increases as increases from 1000 to 4000, and remains approximately constant from 4000 to 6000, suggesting that increasing the discretization dimensionality can improve the inference accuracy until the posterior converges, and so it is important to use sufficiently large discretization dimensionality in such problems. In the rest of the work, we fix .
6.2 Sampling efficiency of the PDpCN algorithm
Next, we shall compare the performance of the proposed PDpCN algorithm and the standard pCN. We draw samples from the posterior distribution using both the standard pCN and the proposed PDpCN algorithms. We reinstate that in both algorithms we have chosen the step size so that the resulting acceptance probability is around . We compute the autocorrelation function (ACF) of the samples generated by the two methods at two points: one in the center of the image, and the other near the boundary of it . We plot the ACF at the two points in Fig. 3, and one can see from the figure that at both points the ACF of the propose PDpCN method decays much faster than that of the standard pCN. To further compare the performance, we compute the effective sample size(ESS):
where is the integrated autocorrelation time and is total sample size. In Fig. 4, we compare the ESS at three chosen rows from left to right in the image, namely row 1, 64 and 128. Just as the ACF, the results show that the PDpCN algorithm achieves much higher ESS than the standard pCN. We have also examined the ESS of other rows where the results are qualitatively similar to those three shown in Fig. 4, and so we omit those results.
6.3 Determining parameter
As is discussed at the beginning of the section, the prior parameter is determined with the realized discrepancy method discussed in Section 4. We here provide some details on the issue. Specifically we test five different values of : , and using the method discussed in Section 4 we compute the corresponding posterior predictive value for each value of , shown in Table 1. We can see from the table that, as increases, the resulting value decays. These results agree well with our expectation that as becomes larger, the prior distribution becomes stronger, and as a result the value which assesses the fitness of the posterior to the data becomes smaller. We also compute the PSNR of the posterior distribution computed with all these values, and the results are also given in the table. We can see here that, for both very large and very small values, the associated posterior means are of rather poor quality in terms of PSNR. That is, when the value is too large, the posterior distribution overfits the data, and when it is too small, the posterior underfits the data; both cases lead to a poor performance of the inference, and so we must choose a proper value that represents a good balance of the prior and the data. Based on our numerical tests, we suggest that for the present problem it is suitable to use where the associated value is in , and so here we choose where the resulting value is 0.28.
0  0.5  1  2  3  4  

0.99  0.82  0.28  0.04  0.0034  0.0005  
PSNR  18.21  21.90  22.0  20.66  19.76  19.27 
6.4 The inference results
To illustrate the inference results, we compute the posterior mean of the TG prior, which is regarded as a point estimate of the image. As is mentioned earlier, a main advantage of the Bayesian method is its ability to quantify the uncertainty in the reconstruction and to this end, we use the width of the (pointwise) 95% HPDI as a metric of the posterior uncertainty (intuitively speaking the wider the HPDI is, the more uncertainty there is). We plot these posterior results in Fig. 5 (Fig. 4(a) and Fig. 4(b)). As a comparison, we also compute the posterior mean as well as the 95% HPDI width, for the Gaussian prior corresponding to setting in the TG prior, and the results are also shown in Figs. 5 (Fig. 4(c) and Fig. 4(d)). The figures show that the posterior mean obtained with the TG prior is clearly of better quality than that of the Gaussian prior, suggesting that including the edgepreserving TV term significantly improves the performance of the prior. It is worth noting here that, the Gaussian prior used here is not optimized for the best performance, and the performance can be potentially improved by using some carefully designed Gaussian priors, for example, [9].
6.5 The impact of Poisson likelihood function
In practice, the observation noise is often assumed to be Gaussian, for that it usually leads to great convenience for computation. In what follows we shall demonstrate that, if the data is generated truly from a Poisson distribution, and if the noise is strong, the use of a Gaussian noise model may yield very poor reconstruction results. Recall that the data used in this example is drawn from the Poisson model, and here we perform the Bayesian inference using a Gaussian likelihood:
(30) 
where is a diagonal matrix with each diagonal component to be for . By design the Gaussian likelihood (30) has the same mean and covariance as the original Poisson likelihood. We compute the posterior distribution using the Gaussian likelihood and the same prior as before, and we plot the posterior mean in Fig. 6. It can be seen clearly from the figure that the posterior mean computed with the Gaussian likelihood is of much poor quality than that of the Poisson likelihood (shown in Fig. 4(a)). The results can be also compared quantitatively: the PSNR of the posterior mean of Gaussian likelihood is calculated to be 12.49, while that of the Poisson likelihood is 21.99. The significant difference between the results of the Gaussian likelihood and the Poisson likelihood suggests that the use of the Gaussian likelihood function may result in poor reconstruction performance in problems with Poisson data.
6.6 Identifying artifacts using HPDI
As is discussed in Section 5, an important application of the proposed Bayesian framework is that the resulting posterior distribution can be used to detect artifacts in a reconstructed image. We now demonstrate this application with three surrogate test images which are generated by making certain modification of the ground truth. Specifically the first image shown in Fig. 6(a) is generated by adding some random noise to the ground truth without any structural changes, the second one shown in Fig. 6(c) is generated by adding some artificial components to the ground truth, and the third one shown in Fig. 6(e), is the result of removing some components from the ground truth. Thus both the last two test images have structural changes from the ground truth, and in both figures, the regions in which components are altered from the ground truth are highlighted with red boxes. We compute the credible level for all three images, and show the results in Figs. 6(b) (for test image 1), 6(d) (for test image 2) and 6(f) (for test image 3). It can be seen here that, though the first test image is visibly perturbed by random noise, it does not have structural difference from the ground truth and so credible level result in Fig. 6(b) does not suggest any region has high likelihood to contain artifacts. On the other hand, in the other two test images, at the locations where the original image is altered (i.e., artifacts introduced), the resulting credible level is significantly higher than other regions, suggesting that these locations may contain artifacts. The results demonstrate that the proposed method can rather effectively detect the artifacts in a test image.
7 Conclusions
In this work, we have presented a complete treatment for performing Bayesian inference and uncertainty quantification for image reconstruction problems with Poisson data. In particular, we formulate the problem in an infinite dimensional setting and we prove that the resulting posterior distribution is wellposed in this setting. Second, to sample the unknown function/image, we provide a modified pCNL MCMC algorithm, the efficiency of which is independent of discretization dimensionality. Specifically the modified algorithm calculates the offset direction in the original pCNL algorithm by using a primaldual method, to avoid computing the gradient of the TV term in our formulation. Third, we also give a method to determine the TV regularization parameter which is critical for the prior distribution. The method is based on the realized discrepancy method for assessing model fitness. Finally we provide an application of the uncertainty information obtained by the Bayesian framework, using the posterior distribution to identify possible artifacts in an image reconstructed. We believe the proposed Bayesian framework can be used to reconstruct images and evaluate the uncertainty or credibility of the images reconstructed in many practical imaging problems with Poisson data. In the future, we plan to further investigate the possibility of applying the methods developed in this work to those realworld problems, especially the PET image reconstruction.
Appendix A Proof of Proposition
We provide a proof of Proposition 1 in the Appendix.
Proof
(1) Since is bounded, from Eq. (5), we obtain directly that
for two constant vectors and . It follows directly that for a positive constant .
For every , we have . By the CauchySchwarz inequality, we obtain the lower bound,
For upper bound, once again we apply the CauchySchwarz inequality to the functional , obtaining
(2) In this proof, we use for positive constants. Let and be any two elements in , and we have,
(31) 
Since the mapping is a bounded linear transform from the to , we have,
(32) 
which completes the proof.
(3) For any , it is easy to show,
References
 [1] Johnson K A and Becker J. www.med.harvard.edu/aanlib/. Accessed April 4, 2010.
 [2] Simon R Arridge, Kazufumi Ito, Bangti Jin, and Chen Zhang. Variational gaussian approximation for poisson data. Inverse Problems, 34(2):025005, 2018.
 [3] Dale L Bailey, Michael N Maisey, David W Townsend, and Peter E Valk. Positron emission tomography. Springer, 2005.
 [4] Johnathan M Bardsley and John Goldes. Regularization parameter selection methods for illposed poisson maximum likelihood estimation. Inverse Problems, 25(9):095005, 2009.
 [5] Johnathan M Bardsley and Aaron Luttman. A metropolishastings method for linear inverse problems with poisson likelihood and gaussian prior. International Journal for Uncertainty Quantification, 6(1), 2016.
 [6] Mario Bertero, Patrizia Boccacci, Giorgio Talenti, Riccardo Zanella, and Luca Zanni. A discrepancy principle for poisson data. Inverse problems, 26(10):105004, 2010.

[7]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends® in Machine learning
, 3(1):1–122, 2011.  [8] Steve Brooks, Andrew Gelman, Galin Jones, and XiaoLi Meng. Handbook of markov chain monte carlo. CRC press, 2011.
 [9] Daniela Calvetti and Erkki Somersalo. A gaussian hypermodel to recover blocky objects. Inverse problems, 23(2):733, 2007.
 [10] Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision, 20(12):89–97, 2004.
 [11] Antonin Chambolle and Thomas Pock. A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40(1):120–145, 2011.
 [12] Tony F Chan, Gene H Golub, and Pep Mulet. A nonlinear primaldual method for total variationbased image restoration. SIAM journal on scientific computing, 20(6):1964–1977, 1999.
 [13] Simon L Cotter, Gareth O Roberts, AM Stuart, David White, et al. MCMC methods for functions: modifying old algorithms to make them faster. Statistical Science, 28(3):424–446, 2013.
 [14] Alain Durmus, Eric Moulines, and Marcelo Pereyra. Efficient bayesian computation by proximal markov chain monte carlo: when langevin meets moreau. SIAM Journal on Imaging Sciences, 11(1):473–506, 2018.
 [15] Jeffrey A Fessler. Penalized weighted leastsquares image reconstruction for positron emission tomography. IEEE transactions on medical imaging, 13(2):290–300, 1994.
 [16] Jeffrey A Fessler. Medical image reconstruction: a brief overview of past milestones and future directions. arXiv preprint arXiv:1707.05927, 2017.
 [17] A Gelman. Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6(4):733–760, 1996.
 [18] Andrew Gelman, Hal S Stern, John B Carlin, David B Dunson, Aki Vehtari, and Donald B Rubin. Bayesian data analysis. Chapman and Hall/CRC, 2013.
 [19] Peter J Green. Bayesian reconstructions from emission tomography data using a modified em algorithm. IEEE transactions on medical imaging, 9(1):84–93, 1990.
 [20] Tom Hebert and Richard Leahy. A generalized em algorithm for 3d bayesian reconstruction from poisson data using gibbs priors. IEEE transactions on medical imaging, 8(2):194–202, 1989.
 [21] Thorsten Hohage and Frank Werner. Inverse problems with poisson data: statistical regularization theory, applications and algorithms. Inverse Problems, 32(9):093001, 2016.

[22]
Alain Hore and Djemel Ziou.
Image quality metrics: Psnr vs. ssim.
In
2010 20th International Conference on Pattern Recognition
, pages 2366–2369. IEEE, 2010.  [23] Jari Kaipio and Erkki Somersalo. Statistical and computational inverse problems, volume 160. Springer, 2005.
 [24] Jinglai Li. A note on the karhunen–loève expansions for infinitedimensional bayesian inverse problems. Statistics & Probability Letters, 106:1–4, 2015.
 [25] Erkan Ü Mumcuoglu, Richard M Leahy, and Simon R Cherry. Bayesian reconstruction of pet images: methodology and performance analysis. Physics in Medicine & Biology, 41(9):1777, 1996.
 [26] Frank Natterer. The mathematics of computerized tomography. SIAM, 2001.
 [27] John M Ollinger and Jeffrey A Fessler. Positronemission tomography. IEEE Signal Processing Magazine, 14(1):43–55, 1997.
 [28] Carlos A de B Pereira, Julio Michael Stern, Sergio Wechsler, et al. Can a significance test be genuinely bayesian? Bayesian Analysis, 3(1):79–100, 2008.
 [29] Marcelo Pereyra. Proximal markov chain monte carlo algorithms. Statistics and Computing, 26(4):745–760, 2016.
 [30] Marcelo Pereyra. Maximumaposteriori estimation with bayesian confidence regions. SIAM Journal on Imaging Sciences, 10(1):285–302, 2017.
 [31] Christian Robert and George Casella. Monte Carlo statistical methods. Springer Science & Business Media, 2013.
 [32] Lawrence A Shepp and Yehuda Vardi. Maximum likelihood reconstruction for emission tomography. IEEE transactions on medical imaging, 1(2):113–122, 1982.
 [33] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numerica, 19:451–559, 2010.
 [34] Måns Thulin. Decisiontheoretic justifications for bayesian hypothesis testing using credible sets. Journal of Statistical Planning and Inference, 146:133–138, 2014.
 [35] Yehuda Vardi, LA Shepp, and Linda Kaufman. A statistical model for positron emission tomography. Journal of the American statistical Association, 80(389):8–20, 1985.
 [36] Zhewei Yao, Zixi Hu, and Jinglai Li. A tvgaussian prior for infinitedimensional bayesian inverse problems and its numerical implementations. Inverse Problems, 32(7):075006, 2016.
Comments
There are no comments yet.