Introduction
Uncertainty quantification and multimodal solution characterization are essential for analyzing the results of underdetermined imaging systems. In computational imaging, reconstruction methods solve an inverse problem to recover a hidden image from measured data. When this inverse problem is illposed there are infinite image solutions that fit the observed data. Occasionally these varied images lead to different scientific interpretations; thus it is important to be able to characterize the distribution of possibilities. In a Bayesian framework, this problem could ideally be addressed by accurately modeling the measurement noise, formulating an estimation problem, and computing the posterior distribution of the hidden image. However, this analytical approach is only tractable in simple cases. When the inverse problem is nonconvex or the measurement noise is complicated (e.g., nonGaussian) the posterior distribution can quickly become intractable to compute analytically.
In this paper, we propose Deep Probabilistic Imaging (DPI), a new variational inference approach that employs a deep generative model to learn a reconstructed image’s posterior distribution
. More specifically, we represent the posterior probability distribution using an invertible flowbased generative model. By training with a traditional maximum
a posteriori (MAP) loss, along with a loss that encourages distribution entropy, the network converges to a generative model that approximately samples from the image posterior; the model returns a posterior sample image as output for a random Gaussian sample input. Our proposed approach enables uncertainty quantification and multimodal solution characterization in nonconvex inverse imaging problems. We demonstrate our method on both synthetic and real astronomical radio interferometric imaging data. High resolution radio interferometric imaging often requires a highly nonconvex forward model, occasionally leading to multimodal solutions.Related Work
Computational Imaging
The goal of a computational image reconstruction method is to recover a hidden image from measured data. Imaging systems are often represented by a deterministic forward model, , where are the observed measurements of a hidden image, . A regularized maximum likelihood (RML) optimization can be solved to reconstruct the image:
(1) 
where is the reconstructed image, is the data fitting loss between the measured data and the forward model prediction, is a regularization function on the image, and is the hyperparameter balancing the data fitting loss and the regularization term. The regularization function is necessary for obtaining a unique solution in underconstrained systems. Commonly used regularization functions include L1 norm, total variation (TV) or total squared variation (TSV) [5, 19], maximum entropy (MEM)[25], multivariate Gaussian [29], and sparsity in the wavelet transformed domain [9].
Assuming the forward model and measurement noise statistics are known, one can derive the probability of measurements, , conditioned on the hidden image, , as . In a Bayesian estimation framework, the regularized inverse problem can be interpreted as a maximum a posteriori (MAP) estimation problem:
(2) 
where the loglikelihood of the measurements correspond to the negative data fitting loss in Eq. 1, and the prior distribution of the image defines a regularization function (also referred to as an image prior).
Deep Image Prior
In lieu of constraining the image via an explicit regularization function, , Deep Image Prior approaches[27] parameterize the image as a deep neural network and implicitly force the image to respect lowlevel image statistics via the structure of a neural network:
(3) 
In this approach, is a deep generator, are the neural network weights, are the optimized weights, and is a randomly chosen, but fixed, hidden state. The network weights are randomly initialized and optimized using gradient descent. This approach has been demonstrated in many applications, including inpainting, denoising and phase retrieval[27, 4]. Note that since the hidden state, , is fixed, only a single reconstructed image is derived after optimization.
Uncertainty Quantification
Uncertainty quantification is important for understanding the confidence of recovered solutions in inverse problems.
MCMC Sampling
One widely used approach for quantifying the uncertainty in image reconstruction is the Markov Chain Monte Carlo (MCMC) method. MCMC provides a way to approximate the posterior distribution of a hidden image via sampling
[2, 7]. However, the MCMC approach can be prohibitively slow for high dimensional inverse problems [8].Variational Bayesian Methods
Variational inference is widely used to approximate intractable posterior distributions [3, 1]. Instead of directly computing the exact posterior distribution, variational Bayesian methods posit a simple family of density functions and solve an optimization problem to find a member of this family closest to the target posterior distribution. Variational Bayesian methods are much faster than sampling methods (e.g., MCMC), and typically achieve comparable performance [15]. The performance of variational methods depends on the modeling capacity of the density function family and the complexity of the target distribution. A commonly used technique for simplifying the density function is the meanfield approximation[17], where the distributions of each hidden variables are assumed independent. The density function can also be parameterized using neural networks, such as flowbased generative models [23].
Bayesian Neural Networks
Deep learning has become a powerful tool for computational imaging reconstruction[22]. Most current deep learning approaches only focus on the point estimate of the hidden image; however, Bayesian neural networks [21, 13] have been developed to quantify the uncertainty of deep learning image reconstructions[28]
. In particular, the weights of a Bayesian network are modeled as probability distributions, so that different predictions are obtained every time the network is executed. Although these approaches can achieve impressive performance, they rely on supervised learning and
only describe the reconstruction uncertainty conditioned on a training set.Empirical Sampling
An alternative empirical approach for obtaining multiple candidate reconstructions is to solve a regularized inverse problem (Eq. 1) multiple times with different choices of regularizer hyperparameters (e.g. in Eq. 1) and image initializations. This approach was used in [26]
to characterize the uncertainty of the reconstructed black hole image M87*. Although the mean and standard deviation of these images provide a measure of uncertainty, there is no expectation that these samples satisfy a posterior distribution defined by the measurement data.
In fact, this method only quantifies the reconstruction uncertainty due to choices in the reconstruction methods, such as regularizer hyperparameters, instead of the uncertainty due to measurement noise and sparsity.Flowbased Generative Models
Flowbased generative models are a class of generative models used in machine learning and computer vision for probability density estimation. These models approximate an arbitrary probability distribution by learning a invertible transformation of a generic distribution
(e.g. normal distribution). Mathematically, a flowbased generative model is represented as
(4) 
where is an invertible deep neural network parameterized by that links a sample from the target distribution, , with a hidden state, . The application of invertible transformations enables both efficient sampling, as well as exact loglikelihood computation. The loglikelihood of a sample can be analytically computed based on the “change of variables theorem”:
(5) 
where is the determinant of the generative model’s Jacobian matrix.
To keep the computation of the second term tractable, the neural network function, , is restricted to forms such as NICE[11], RealNVP[12] and Glow[18]. In these network architectures, the Jacobian matrix is a multiplication of only lower triangular matrices or quasidiagonal matrices, which leads to efficient computation of the determinant.
Method
In this paper, we propose a variational Bayesian method to learn an approximate posterior distribution for the purpose of efficiently characterizing uncertainty in underdetermined imaging systems. We parameterize the latent image distribution using a flowbased generative model,
(6) 
and learn the model’s weights by minimizing the Kullback–Leibler (KL) divergence between the generative model distribution, , and the image posterior distribution, :
(7)  
Note that this loss can be interpreted as an an expectation over the maximum a posteriori (MAP) loss from Eq. 2 with an added term to encourage entropy on the image distribution. Minimizing the negative entropy term, , prevents the generative model from collapsing to a deterministic solution.
For most deep generative models the sample likelihood, , cannot be evaluated exactly. However, since a sample’s likelihood can be computed according to Eq. 5 for flowbased models, this stochastic optimization problem can be rewritten as
(8) 
Approximating the expectation using a Monte Carlo method, and replacing the data likelihood and prior terms with the data fitting loss and regularization functions from Eq. 1, we obtain the optimization problem
(9) 
where , is the number of Monte Carlo samples, and the term is omitted since its expectation is constant. The expectation of the data fitting loss and image regularization loss are optimized by sampling images from the generative model . Note that when does not define the true data likelihood, or does not define an image prior, the learned network only models an approximate image posterior instead of the true posterior distribution.
The data fitting loss and the regularization function are often empirically defined and may not match reality perfectly. Recalling that the third term is related to the entropy of the learned distribution, similar to VAE[16], we introduce another tuning parameter to control the diversity of the generative model samples,
(10) 
When the uncertainty of the reconstructed images seems smaller than expected, we can increase to encourage higher entropy of the generative distribution; otherwise, we can reduce to reduce the diversity of reconstructions. Larger also encourages more exploration during training, which can be used to accelerate convergence.
Toy Examples
We first study our method using twodimensional toy examples. Assuming
is twodimensional, and the joint distribution
is given exactly by the potential function , Eq. Method can be simplified to(11) 
For the following toy tests the generative model is designed using a RealNVP architecture with 32 affine coupling layers [12].
We test the approach on 3 joint distribution functions: (a) a Gaussian mixture potential, (b) a bowtie potential, and (c) a Sinusoidal potential. Fig. 1
shows the true and learned probability density function in these three cases for
. Qualitatively, the learned generative model distributions match the true distributions well.As derived in Eq. 7, the posterior should be best learned by for entropy weight . To test this claim, we adjust from to to see how the entropy term influences the learned generative model distribution. According to the graph of KL divergence versus in Fig. 1, the learned distributions match the true distributions best when the entropy weight equals . This verifies the theoretical expectation presented in the Method section.
Since the generative model is a transformation of a continuous multivariate Gaussian distribution, the learned distribution is also continuous. This leads to a common issue in flowbased generative models: there are often a few samples located in the high loss regions when the modes are not connected (see Fig.
1 distribution (a) and (b)). Some approaches[14] have been proposed recently to solve this problem, however, in this paper we neglect this issue and leave it for future work.Interferometric Imaging Case Study
We demonstrate our proposed approach on the problem of radio interferometric astronomical imaging [24]. In interferometric imaging, radio telescopes are linked to obtain sparse spatial frequency measurements of an astronomical target. These Fourier measurements are then used to recover the underlying astronomical image. Since the measurements are often very sparse and noisy, there can be multiple image modes that fit the observed data well. The Event Horizon Telescope (EHT) used this technique to take the first picture of a black hole, by linking telescopes across the globe[26]. Fig. 2 shows the spatial frequency (Fourier) samples that can be measured by a 9telescope EHT array when observing the black hole Sagittarius A* (Sgr A*)^{1}^{1}1The largest sampled spatial frequency determines the interferometer’s nominal resolution of 25 for the EHT. In this paper, we neglect evolution of Sgr A* and assume it is static over the course of a night..
Each Fourier measurement, or socalled “visibility”, is obtained by correlating the data from a pair of telescopes. The measurement equation for each visibility is given by
(12) 
where and index the telescopes, represents time, and extracts the Fourier component from image corresponding to the baseline between telescope and at time . The measurement noise comes from three sources: (1) timedependent telescopebased gain error, and , (2) timedependent telescopebased phase error, and , and (3) baselinebased Gaussian thermal noise, . The standard derivation of thermal noise depends on each telescope’s System Equivalent Flux Density (SEFD):
(13) 
When the gain and phase errors are reasonably small, the interferometric imaging problem is approximately a convex inverse problem. However, when the gain errors and the phase errors (caused by atmospheric turbulence and instrument miscalibration) are large, the noisy visibilities can be combined into robust data products that are invariant to telescopebased errors, termed closure phase, , and closure amplitude, [10]:
(14) 
These nonlinear “closure quantities” can be used to constrain nonconvex image reconstruction
(15) 
where and are the standard deviations of the corresponding closure term computed based on SEFDs. Note the closure quantities are not independent and the corresponding standard deviations are derived from linearization, so Eq. 15 only approximates the true data loglikelihood.
In the following contents, we demonstrate our Deep Probabilistic Imaging (DPI) approach on both convex reconstruction with complex visibilities and nonconvex reconstruction with closure quantities using both synthetic and real datasets. With this new approach, we successfully quantify the uncertainty of interferometric images, as well as detect multiple modes in some data sets.
Implementation
For all interferometric imaging results we use a DPI flowbased RealNVP [12]
network architecture with 48 affine coupling layers. The scaleshift function of each layer is modeled as a UNet style fully connected network with five hidden layers and skip connections. We train the model using Adam with a batch size of 32 for 20,000 epochs. We note that a limitation of our general approach is the modeling capacity of the flowbased generative model used. We find the proposed architecture is satisfactory for characterizing uncertainty in images of size 32
32 pixels.Convex Imaging with Visibilities
In this section, we demonstrate DPI on convex interferometric imaging. In particular, the gain and phase errors in Eq. 12 are assumed to be zero (i.e., and ) so that complex visibilities are a linear function of the latent image. Since the thermal noise on is independent and Gaussian, we write the conditional likelihood as
(16) 
where
is a column vectorized image with
pixels, is a column vector of complex visibility measurements,is a undersampled Fourier transform matrix of size
, and is a measurement covariance matrix derived according to the telescopes’ SEFD.In order to verify that the flowbased generative model can learn the posterior distribution of reconstructed images, we employ a multivariate Gaussian image prior,
(17) 
where is a mean image, and is the covariance matrix defined by the empirical power spectrum of an image[6]. Since both the measurement and image prior follow Gaussian distributions, the reconstructed image’s posterior distribution is also a Gaussian distribution and can be analytically derived as
(18) 
Using the specified data likelihood and prior, we train a DPI flowbased network to produce image samples of size 3232 pixels with a field of view of 160 microarcseconds (). Fig. 3 demonstrates DPI on a synthetic interferometric imaging example, and compares the learned generative model distribution with the analytical posterior distribution. Visibility measurements are derived from the synthetic black hole image shown in the top of Fig. 3 (with a total flux of 2 Janskys). The second row of the Fig. 3 shows the analytic posterior’s mean, standard deviation, and full covariance. The third and fourth rows of the figure show the mean, standard deviation, and full covariance empirically derived from DPI samples under two slightly different architectures: (A) the third row uses a model with 48 affine coupling layers, and (B) the fourth row adds an additional Softplus layer to the model to enforce the physical constraint of nonnegativity in the image distribution. Without a nonnegativity constraint, Architecture A’s learned distribution is very similar to the analytical posterior distribution, since it is solving a same Bayesian estimation problem as defined in Eq. Convex Imaging with Visibilities. However, this Bayesian estimation problem does not constrain the image to be positive; as a result, the central depression in the image has an inflated standard deviation. Architecture B’s nonnegative model results in a more constraining uncertainty map while achieving a slightly higher resolution reconstruction. This example also demonstrates how DPI can introduce implicit regularization through the neural network architecture.
Image distributions with different levels of sample diversity can be learned by adjusting the entropy loss weight, . As expected, both generative models reach lowest KL divergence with the analytic distribution when .
Nonconvex Imaging with Closure Quantities
In this section, we demonstrate DPI on nonconvex interferometric imaging, where we reconstruct images using the closure quantities defined in Eq. 14. With this nonconvex forward model, the posterior of reconstructed images cannot be analytically computed, but it can be estimated using DPI. In all DPI reconstructions, the resulting images are 3232 pixels and result from the nonnegative RealNVP model discussed above (Architecture B).
Multimodal Posterior Distributions
A serious challenge for nonconvex image reconstruction is the potential for multimodal posterior distributions: visuallydifferent solutions fit the measurement data reasonably well. In some cases, multiple modes can be identified by optimizing a regularized maximum likelihood (RML) loss with different image initializations; for example, Fig. 5 (top) shows ten RML reconstructed images obtained using the closure quantities (Eq. 15) from the target shown in Fig. 3 and the multivariate Gaussian regularizer defined in Eq. 17. From these results two potential image modes, which appear to be roughly 180 degree rotations of one another, clearly stand out as fitting the data well. Fig. 5 (bottom) shows ten images sampled from a DPI flowbased generative model learned with multivariate Gaussian regularization. Note that the single generative model has captured the two modes identified by multiple runs of RML.
Fig. 4 analyzes 1024 generative samples from a DPI model learned with multivariate Gaussian regularization. The dimensionality reduction tSNE plot[20] indicates a clustering of samples into two modes. Figure 4 (middle) shows the pixelwise mean, standard deviation and fractional standard deviation of samples for each mode. The distributions of data fitting loss (reduced ) for images in each mode are shown for both closure phase and log closure amplitude constraints; a reduced value of 1 is optimal. Although it can be difficult to tell which image is correct by inspecting the statistics of a single image, by analyzing the histogram of statistics for each mode it becomes clearer which mode corresponds with the true image. In the supplemental material^{2}^{2}2 http://users.cms.caltech.edu/~hesun/DPIsupplement.pdf we show how the resulting posterior changes under different imposed regularization.
Real Interferometric Data
In Fig. 7 we demonstrate the performance of DPI using the publicly available EHT 2017 data, which was used to produce the first image of the black hole in M87. In accordance with [26], we use a data fitting loss with not only closure quantities (Eq. 15) but also roughlycalibrated visibility amplitudes. We pair this data likelihood with a combined maximum entropy (MEM) and total squared variation (TSV) regularizer (see the supplementary material for details^{2}^{2}footnotemark: 2). Fig. 7 shows the DPI reconstruction mean and standard deviation of M87 on different observation days. Although ground truth images are unavailable, the ring size and evolution recovered using DPI matches that seen in the original EHT imaging results. The DPI results also quantify larger uncertainty in “knot” regions along the lower portion of the ring.
Visualizing Uncertainty
DPI sampling provides a means to visualizing uncertainty, especially in cases where closed form approximations are insufficient. By embedding samples from our DPI model in a twodimensional space, we are able to visualize the posterior’s approximate structure. Fig. 6 plots the embedding of DPI samples obtained using tSNE, and shows the images corresponding to some of these samples. By plotting the blurred truth image in the same embedding we see that the truth often lies close to the posterior samples in the embedded space, though it is not guaranteed. The posterior is sensitive to the choice of image prior, and is most informative when the true image is a sample from the imposed prior.
The mean and standard deviation of DPI samples, as well as the average absolute error with respect to the blurred truth, are shown for each learned distribution in Fig. 6. Since closure quantities do not constrain the absolute flux or position of the reconstructed source, we first normalize and align samples from the closureconstrained DPI model to account for scaled and shifted copies of similar images. Note that the pixelwise standard deviation aligns well with areas of high error in the generative model’s samples.
Conclusion
In this paper, we present deep probabilistic imaging (DPI): a new framework for uncertainty quantification and multimodal solution characterization for underdetermined image reconstruction problems. The method parameterizes the posterior distribution of the reconstructed image as an untrained flowbased generative model, and learns the neural network weights using a loss that incorporates the conditional data likelihood, prior of image samples, and the model’s distribution entropy.
We demonstrate the proposed method on toy examples as well as both synthetic and real interferometric imaging problems. Experiments show the proposed method can approximately learn the image posterior distribution in both convex and nonconvex inverse problems, which enables efficiently quantifying the uncertainty of reconstructed images and detecting multimodal solutions.
References
 [1] (201907) Unified radio interferometric calibration and imaging with joint uncertainty quantification. 627, pp. A134. External Links: Document, 1903.11169 Cited by: Variational Bayesian Methods.
 [2] (2012) MCMCbased image reconstruction with uncertainty quantification. SIAM Journal on Scientific Computing 34 (3), pp. A1316–A1332. Cited by: MCMC Sampling.
 [3] (2017) Variational inference: a review for statisticians. Journal of the American statistical Association 112 (518), pp. 859–877. Cited by: Variational Bayesian Methods.
 [4] (2020) Deep phase decoder: selfcalibrating phase microscopy with an untrained deep neural network. Optica 7 (6), pp. 559–562. Cited by: Deep Image Prior.
 [5] (1993) A generalized gaussian image model for edgepreserving map estimation. IEEE Transactions on image processing 2 (3), pp. 296–310. Cited by: Computational Imaging.
 [6] (2018) Reconstructing video of timevarying sources from radio interferometric measurements. IEEE Transactions on Computational Imaging 4 (4), pp. 512–527. Cited by: Convex Imaging with Visibilities.
 [7] (2020) Hybrid very long baseline interferometry imaging and modeling with themis. The Astrophysical Journal 898 (1), pp. 9. Cited by: MCMC Sampling.
 [8] (2018) Uncertainty quantification for radio interferometric imaging–i. proximal mcmc methods. Monthly Notices of the Royal Astronomical Society 480 (3), pp. 4154–4169. Cited by: MCMC Sampling.
 [9] (2007) Sparsity and incoherence in compressive sampling. Inverse problems 23 (3), pp. 969. Cited by: Computational Imaging.
 [10] (2018) Interferometric imaging directly with closure phases and closure amplitudes. The Astrophysical Journal 857 (1), pp. 23. Cited by: Interferometric Imaging Case Study.
 [11] (2014) Nice: nonlinear independent components estimation. arXiv preprint arXiv:1410.8516. Cited by: Flowbased Generative Models.
 [12] (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: Flowbased Generative Models, Toy Examples, Implementation.
 [13] (2016) Uncertainty in deep learning. University of Cambridge 1 (3). Cited by: Bayesian Neural Networks.

[14]
(2020)
Flow contrastive estimation of energybased models
. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 7518–7528. Cited by: Toy Examples.  [15] (2012) Nonparametric variational inference. arXiv preprint arXiv:1206.4665. Cited by: Variational Bayesian Methods.
 [16] (2016) Betavae: learning basic visual concepts with a constrained variational framework. Cited by: Method.
 [17] (1999) An introduction to variational methods for graphical models. Machine learning 37 (2), pp. 183–233. Cited by: Variational Bayesian Methods.
 [18] (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in neural information processing systems, pp. 10215–10224. Cited by: Flowbased Generative Models.
 [19] (2018) Superresolution interferometric imaging with sparse modeling using total squared variation: application to imaging the black hole shadow. The Astrophysical Journal 858 (1), pp. 56. Cited by: Computational Imaging.
 [20] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Multimodal Posterior Distributions.
 [21] (1995) Bayesian neural networks and density networks. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 354 (1), pp. 73–80. Cited by: Bayesian Neural Networks.
 [22] (2020) Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory. Cited by: Bayesian Neural Networks.
 [23] (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Cited by: Variational Bayesian Methods.
 [24] (2017) Interferometry and synthesis in radio astronomy. Springer Nature. Cited by: Interferometric Imaging Case Study.
 [25] (1984) Maximum entropy image reconstructiongeneral algorithm. Monthly notices of the royal astronomical society 211, pp. 111. Cited by: Computational Imaging.
 [26] (2019) First m87 event horizon telescope results. iv. imaging the central supermassive black hole. The Astrophysical Journal Letters 875 (1), pp. L4. Cited by: Empirical Sampling, Figure 7, Real Interferometric Data, Interferometric Imaging Case Study.
 [27] (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: Deep Image Prior.
 [28] (2019) Reliable deeplearningbased phase imaging with uncertainty quantification. Optica 6 (5), pp. 618–629. Cited by: Bayesian Neural Networks.
 [29] (2011) From learning models of natural image patches to whole image restoration. In 2011 International Conference on Computer Vision, pp. 479–486. Cited by: Computational Imaging.
Comments
There are no comments yet.