In compressed sensing MRI reconstruction, the commonly used analytical regularization such as regularization can ensure the convergence of the iterative algorithm and improve MR image quality . The conventional iterative reconstruction algorithm with analytical regularization has an explicit mathematical deduction in gradient descent, which ensures the convergence of the algorithm to a local or global optimal and the generalizability, depending on the convexity of the regularization function. Besides, the dictionary learning is an extension of analytical regularization, providing an improvement over the regularization in specific applications . The study of analytically regularized reconstruction mainly focused on choosing the appropriate regularization function and parameters for minimizing the reconstruction error. As an extension of analytical regularization, the deep learning reconstruction was employed as an unrolled iterative algorithm for solving the regularized optimization or used as a substitute for analytic regularization [20, 18, 12, 1].
With the advances of deep learning methodology, research started shifting the paradigm to structured feature representation of MRI, such as cascade, deep residual, and generative deep neural networks[20, 18, 12, 1]. Especially, the method proposed in 
recast the compressed sensing reconstruction into a specially designed neural network that still partly imitated the analytical data fidelity and regularization terms. In that study, the analytical regularization term was replaced with convolutional layers and a specially designed activation function. These deep learning methods may show improved performance in some predetermined acquisition settings or pre-trained imaging tasks. However, they also lack flexibility when used with changes in MRI under-sampling scheme, the number of radio-frequency coils, and matrix size or spatial resolution. Such restriction is caused by the mixing of k-space data fidelity and the regularization terms in neural network implementations. Therefore, it was preferable to separate the k-space data fidelity and neural network-based regularization for improving the flexibility in changing MRI acquisition configurations.
This study applied Bayesian inference to model the MRI reconstruction problem, and the statistical representation of an MRI database was used as a prior model. In Bayesian inference, the prior model is required to be computationally scalable and tractable [16, 2]. The scalability of the prior model indicates the likelihood of it being used as a means for measuring the image quality [16, 2]. The tractability of the prior model means the gradient that facilitates the maximization of posterior distribution can be calculated by stochastic backpropagation for the model [16, 2]. In such Bayesian inference, the image to be reconstructed was referred to as the parameters of the Bayesian model, which was conditioned on the measured k-space data (as the posterior). Bayes’s theorem expressed the posterior as a function of the k-space data likelihood and the image prior. For the image prior, Refs [17, 13]
proposed a generative deep learning model, providing a tractable and scalable likelihood. In those studies, the image prior model was written as the multiplication of the conditional probabilities those indicated pixel-wise dependencies of the input image. The k-space data likelihood described how the measured k-space data was computed from a given MR image. The relationship between k-space data and MR image can be described, using the well-known MRI encoding matrix in an equality constraint[10, 19]
. With such computationally scalable and tractable prior model, the maximum a posterior can serve as an effective estimator for the high dimensional image reconstruction problem tackled in this study. To summarize, the Bayesian inference for MRI reconstruction had two separate models: the k-space likelihood model that was used to encourage data consistency and the image prior model that was used to exploit knowledge learned from an MRI database.
This paper presented a generic and interpretable deep learning-based reconstruction framework, using Bayesian inference. It employed a generative network as the MR image prior model. The proposed framework was capable of exploiting the MR image database with the prior model, regardless of the changes in MR imaging acquisition settings. Also, the reconstruction was achieved by a series of inferences those employed the maximum likelihood of posterior with the image prior, i.e., applying the Bayesian inference repeatedly. The reconstruction iterated over the data fidelity enforcement in k-space and the image refinement, using the Bayesian inference. During the iteration, the projected sub-gradient algorithm was used to maximize the posterior. The method is theoretically described, which was adapted from the methodology proposed by others , and then demonstrated in different MRI acquisition scenarios, including parallel imaging, compressed sensing, and non-Cartesian reconstructions. The robustness and the reproducibility of the algorithm were also experimentally validated.
The proposed method applied a generative neural network, as a data-driven MRI prior, to an MRI reconstruction method. This section contained an MRI reconstruction method using Bayes
theorem and a generative neural network-based MRI prior model, a pixel-wise joint probability distribution for images, using the PixelCNN++.
MRI reconstruction using Bayes theorem
With Bayes theorem, one could write the posterior as a product of likelihood and prior:
where was probability of the measured k-space data for a given image , and was the prior model of the image. The image reconstruction was achieved by exploring the posterior with an appropriate estimator. The maximum a posterior estimation (MAP) could provide the reconstructed image that was given by:
Following the PixelCNN++ , a deep neural network model, which was trained with MR image database, was used to approximate the prior .
Prior model for MR images
In this study, a deep autoregressive network 
was used as the prior model. This deep neural network served as a generative autoencoder that provided a hierarchic representation of the input image. The prior model predicted a mixture distribution of the input image. For MRI reconstruction, we adopted the prior model in the PixelCNN++ , except that t he number of image channels was changed from three (i.e., RGB channels for color image) to two (i.e., real and imaginary parts for MR image). For each image pixel, the variable had a continuous distribution that gave representation to real or imaginary signal intensity. Like in the VAE and pixelCNN++ [17, 9], the distribution of was a mixture of the logistic distribution, given by
Here, was the mixture indicator, and were the mean and scale of logistic distribution, respectively. Then the probability on each observed pixel was computed as 
was the logistic sigmoid function. Furthermore, in[17, 13], each pixel was dependent on all previous pixels up and to the left in an image, as shown in Figure 1a. The conditional distribution of the subsequent pixel at position was given by 
where the denoted the context information which was comprised of the mixture indicator and the previous pixels as showed in Figure 1a, was the coefficient related to mixture indicator and previous pixels.
was also a joint distribution for both real and imaginary channels. The real part of the first pixel, i.e.,in Figure 1a, was predicted by a mixture of logistics as described in Eq. 3. This definition assumed that the mean of mixture components of the imaginary channel was linearly dependent on the real channel. In this study, the number of mixture components was 10. In this model, mixture indicator was shared between two channels. The
image could be considered as an vectorized imageby stacking pixels from left to right and up to bottom of one another, i.e., and . The joint distribution of the image vector could be expressed as following :
were the parameters of mixture distribution for each pixel intensity. The generative network PixelCNN++ was expected to predict the joint probability distribution of all pixels in the input image , as illustrated in Figure 1b. Therefore, the network was trained by maximizing the likelihood in Eq. 7, as the training loss was given by
where was the trainable parameter within the network. After training, the network could be used as the image prior. Here, we defined the prior model as
Image reconstruction by MAP
The measured k-space data was given by
where was the encoding matrix, was MR image, and was the noise. The matrix consisted of Fourier matrix, sampling trajectory, and coil sensitivity map. Substituting Eq. 9 into the log-likelihood for Eq. 2 yielded
From the data model, the log-likelihood term for had less uncertainty, considering the MR imaging principles, for a given image , the probability for k-space, , i.e., when , was close to a constant with the uncertainty from noise that was irrelevant to . Hence, Eq. 11 could be rewritten as
The equality constraint for data consistency was the result of eliminating the first log-likelihood term in Eq. 11. The projected subgradient method was used to solve the equality constrained problem [7, 3]. In 
, authors proposed a stochastic backpropagation method for computing gradients through random variables for deep generative models. In PixelCNN++, the stochastic backpropagation provided the subgradient, where , for minimizing the log-likelihood in Eq. 12. We empirically found that the dropout (which applied to ) was necessary, when using the gradient to update in Eq. 12 . To summarize, the MAP-based MRI reconstruction had the following iterative steps:
Get the descent direction
Pick up a step size
The projection of onto was given by
Therefore, the generative network as a prior model was incorporated into the reconstruction of through the Bayesian inference based on MAP.
MRI data and pre-processing
Both knee and brain MRI data were used to test the reconstruction performance of the proposed method. The knee MRI data (multi-channel k-space data, 973 scans) were downloaded from fastMRI reconstruction database 
. As such, NYU fastMRI investigators provided data but did not participate in analysis or writing of this report. A listing of NYU fastMRI investigators, subject to updates, can be found at: fastmri.med.nyu.edu. The primary goal of fastMRI is to test whether machine learning can aid in the reconstruction of medical images. The knee data had two contrast weightings: proton-density with and without fat suppression (PDFS and PD). Scan parameters included 15-channel knee coil and 2D multi-slice turbo spin-echo (TSE) acquisition, and other settings which could be found in Ref..
For brain MRI, we collected 2D multi-slice T1 weighted, T2 weighted, T2 weighted FLAIR, and T2 weighted brain images from 16 healthy volunteers examined with clinical standard-of-care protocols. All brain data were acquired using our 3T MRI (Philips, Achieva), and an eight-channel brain RF coil. T1 weighted, T2 weighted, and T2 weighted FIAIR images were all acquired with TSE readout. Meanwhile, T2-weighted images were obtained using a gradient-echo sequence. Brain MRI parameters for four contrast weightings were listed in Table 1.
Training images were reconstructed from multi-channel k-space data without undersampling. Then, these image datasets after coil combination were scaled to a magnitude range of and resized to an image size of 256256. The training of PixelCNN++ model required a considerable computational capacity when a large image size was used. In this study, the was the largest size that our 4-GPU server could handle. Hence, the original images were resized into low-resolution images by cropping in k-space for knee MRI. For brain MRI, we split each raw image into four image patches, before fed into the network for training. Real and imaginary parts of all 2D images were separated into two channels when inputted into the neural network. For knee MRI, 15541 images were used as the training dataset, and 100 images were used for testing. For brain MRI, 1300 images were used as the training dataset, and 100 images were used for testing.
Deep neural network
The PixelCNN++ was modified from the code in https://github.com/openai/pixel-cnn. We implemented the reconstruction algorithm using Python, as explained in Eq. 13 and Appendix. With the trained prior model, we implemented the iterative reconstruction algorithm for maximizing the posterior while enforcing the k-space data fidelity (as explained in Appendix and Fig. 1
)c. Only two deep learning models were trained and utilized, one for knee MRI with two contrast weightings, and another for brain MRI with four contrast weightings. These two models can support all experiments performed in this study with variable undersampling patterns, coil sensitivity maps, channel numbers, image sizes, and trajectory types. Our networks were trained in Tensorflow software, and on four NVIDIA RTX-2080Ti graphic cards. Other hyperparameters were 500 epochs, batch size = 4, and Adam optimizer. It took about five days to train the network for knee dataset and two days for brain dataset under the above-mentioned configuration.
Parallel imaging and or regularization driven reconstruction
The GRAPPA reconstruction was performed with a block size of 4 and 20 central k-space lines as the auto-calibration area . We simulated GRAPPA accelerations with undersampling factors from 2 to 4. The representative undersampling masks were shown in Supplementary Figure 1. We chose -ESPIRIT and MODL  as baseline methods for comparison. They were analytical regularizations. The -ESPIRIT exploited the sparsity of image, and the MODL was a deep learning method for compressed sensing reconstruction, trained via minimizing reconstruction error. In the -ESPIRiT reconstruction, we set the regularization parameter to be 0.01, using BART software. One reason for choosing MODL was that it supported the coil sensitivity map for applying paralleling imaging. We followed settings in Ref  when training MODL to reconstruct the undersampled knee data. The only difference was the k-space mask in Ref  was 2D undersampled, while in the current study, the 1D undersampling was applied. The central 20 k-space lines were sampled which account for 7% of the full k-space of one image. The others in the outer region were picked randomly with certain undersampling rate.
For the proposed method, MR images with matrix size were reconstructed, using the prior model in Eq. 9 that was trained by images or image patches. During inference, the image was split into four patches for applying the prior model, as shown in Figure 1c. After updating , four patches for one image were concatenated to form an image with the original size of , before it was projected onto in Eq. 13. The detailed algorithm was presented in the Appendix.
Non-Cartesian k-space acquisition
In this experiment, spiral sampled k-space from the acquired T2-weighted k-space data was simulated. The method proposed in Ref 
was used to design the spiral trajectory. The full k-space coverage required 24 spiral interleaves for the spatial resolution used in this study. The spiral trajectory was shown in Supplementary Figure 1. Besides, the implementation of non-uniform fast Fourier transform was based on the method in Ref. For comparison, we used the iterative SENSE, i.e., conjugate gradient SENSE (CG SENSE), proposed in Ref , as a baseline method.
Figures 2 and 3 show the comparison of knee and brain MRI reconstructed using GRAPPA and the proposed method. The proposed method had an improved performance in recovering brain and knee image details and reducing the aliasing artifacts, compared with GRAPPA. As expected, parallel imaging amplified the noise in the low coil sensitivity regions and along the undersampled dimension. On the other hand, error maps demonstrated in Figure 2 and 3 showed that the proposed method effectively eliminated the noise amplification and the aliasing artifacts. Table 2 presents the comparison of GRAPPA reconstruction and the proposed method for knee (N = 100) and brain (N = 100) MRI testing images. With the increase of the undersampling factor, the peak signal to noise ratio (PSNR) of the proposed method decreased less, compared with that of GRAPPA. In addition, with acceleration factor R = 2 in brain MRI, the proposed method showed 8 dB more improvement in the PSNR than GRAPPA.
Compressed sensing reconstruction
In Figures 4 and 5, the -ESPIRiT had caused apparent blurring in the reconstructed images for both knee and brain MRI data. Both the -ESPIRiT and MODL methods caused residual aliasing artifacts. Meanwhile, the proposed reconstruction recovered most anatomical structures and sharp boundaries in knee and brain MR images, compared with those from -ESPIRiT and MODL reconstructions, as shown on error maps in Figure 4 and 5. Tables 3 summarized reconstruction results using regularization, MODL, and the proposed method. The proposed method generally showed more than 5 dB PSNR improvement compared with -ESPIRiT and MODL.
Preliminary result in non-Cartesian MRI reconstruction and quantitative susceptibility mapping (QSM)
In this study, we used a T2 weighted gradient-echo images to simulate the spiral k-space data with 4-fold acceleration. The reconstructed images from the CG SENSE and the proposed method were compared. The proposed method showed apparent improvement regarding the aliasing artifact reduction and the preservation of T2 contrast between gray matter and white matter. The proposed method also showed a slight denoising effect on the reconstructed image compared with the ground truth. Noted that the same deep learning model used in the previous Cartesian k-space reconstruction experiments in Figures 3 and 5 was applied to spiral reconstruction, without the need of re-training the deep learning model. Figure 7 shows the preliminary result from the proposed accelerated reconstruction in QSM with 4-fold acceleration. Noted that the same deep learning model used in the previous brain experiments was applied to this experiment, with phase information preserved in all reconstructed images. The proposed deep learning method also showed an apparent de-noising effect on QSM maps, while still preserved the major phase contrast even with high acceleration.
The proposed method can reliably and consistently recover the nearly aliased-free images with relatively high acceleration factors. Meanwhile, as expected, the increase of image smoothing with high acceleration factors was noticed, reflecting the loss of intrinsic resolution. The estimated image from the maximum of the posterior can not guarantee the full recovery of the image details, i.e., PSNR 40 dB for a full recovery. However, at modest acceleration, the reconstruction from a maximum of posterior showed the successful reconstruction of the detailed anatomical structures, such as vessels, cartilage, and membranes in-between muscle bundles.
In this study, the results demonstrated the successful reconstruction of high-resolution image (i.e., 256 256 matrix) with low-resolution prior (i.e., trained with 128 128 matrix), confirming the feasibility of reconstructing images of different sizes without the need for retraining the prior model. The prior model was trained by 128 x 128 images; it was still valid and applicable for the reconstruction of a high-resolution image. The proposed methods provided more than 8 dB improvement over the conventional GRAPPA reconstruction at the 4-fold acceleration in knee MRI. Besides, in contrast to other deep learning-based methods, which focused on the loss, the likelihood that was conditioned by pixel-wise dependencies of the whole image showed an improved representation capacity, leading to a higher reconstruction accuracy. The applicability of the proposed method in the patch-based reconstruction also suggested its high representation capacity and flexibility. Even when the inputs were image patches, the prior model could still recover the whole image.
The projected subgradient approach to solving Eq. 12 was computationally inexpensive but converged slowly, as shown in Figure 8. For a random initialization, the algorithm needed about 500 iterations to converge with a fixed step size. Meanwhile, we noticed that if the zero-filled-reconstructed image was used for initialization, the number of iterations required could be reduced to 100. Besides, the decay of residual norm stopped earlier than that of the log-likelihood, i.e., when the residual norm stopped decaying, the likelihood can still penalize the error. This evidence indicated that using the residual norm as the fidelity alone was sub-optimal, and the deep learning-based statistical regularization can lead to a better reconstruction result compared with the fidelity. Deep learning-based statistical regularization in the proposed method outperformed other conventional regularizations trained by image-level loss. loss did not give an explicit description of the relationship amid all pixels in the image, while the likelihood used in conjunction with the proposed image prior model was conditioned by the pixel-wise relationship and demonstrated superior performance compared with the conventional methods, under the current experimental setting.
Furthermore, the demonstrated image prior can be extended to a more elaborated form with clinical information, such as organs and contrast types, as the model inputs. For example, one could input the image prior with labels such as brain or knee. Then hypothetically, the image prior can be designed as a conditional probability for the given image label. In other words, the posterior would be dependent on both the k-space data and image labels. Moreover, the MR pulse sequence parameters could serve as image labels for the prior, such as echo time and repetition time. In short, the prior model can be used to describe clinical information or acquisition parameters. This setting opens up a future direction on a more elaborated image prior, incorporating clinical information and MR sequence parameters, for more intelligent image representation and pattern detection.
In this study, the generative network solely served as an image prior model, in contrast to how neural network was used in other deep learning-based reconstructions [20, 18, 12, 1]. Specifically, in previous studies [20, 18, 12, 1], embedding k-space fidelity term into the network made the algorithm inflexible because image prior and undersampling artifacts were mixed during the training. The proposed method used the standard analytical term for fidelity enforcement; therefore, its flexibility was comparable to the traditional optimization algorithm, such as regularization. Due to unavoidable changes of the encoding scheme, e.g., the image size and the RF coils during MRI experiment in practice, it was essentially needed to separate the learned component (the image prior) from the encoding matrix used in the fidelity term in reconstruction. Besides, the proposed method showed the feasibility of incorporating the coil sensitivity information in the fidelity term, which enabled the changeable encoding scheme without the need of retraining the model [14, 5]. In summary, the separation of the image prior and the encoding matrix embedded in the fidelity term made the proposed method more flexibility and generalizable compared with conventional deep learning approaches.
In summary, this study presented the application of Bayesian inference in MR imaging reconstruction with the deep learning-based prior model. We demonstrated that the deep MRI prior model was a computationally tractable and effective tool for MR image reconstruction. The Bayesian inference significantly improved the reconstruction performance over that of conventional sparsity prior in compressed sensing. More importantly, the proposed reconstruction framework was generalizable for most reconstruction scenarios.
We thank Dr. Hongjiang Wei for providing the QSM processing code.
|T1||25625624||0.9 0.9 4||7||2000/20||800|
|T2||25625624||0.9 0.9 4||13||3000/80||-|
|FLAIR||25625624||0.9 0.9 4||31||8000/135||135|
|T2||25625628||0.9 0.9 4||-||770/16||-|
|15% + 7%||knee||29.332.82||27.633.41||35.343.53|
|20% + 7%||knee||31.513.60||29.293.76||37.453.81|
|15% + 7%||brain||32.863.46||30.602.78||39.782.83|
|20% + 7%||brain||34.723.89||32.462.95||41.242.81|
|PSNR (dB) 34.55||34.16||40.92|
|CG SENSE||Ours||Ground truth|
-  (2018) Modl: model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging 38 (2), pp. 394–405. Cited by: §1, §1, Parallel imaging and or regularization driven reconstruction, Discussion.
-  (2019) Solving inverse problems using data-driven models. Acta Numerica 28, pp. 1–174. Cited by: §1.
-  (2003) Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter 2004, pp. 2004–2005. Cited by: Image reconstruction by MAP.
Nonuniform fast fourier transforms using min-max interpolation. IEEE transactions on signal processing 51 (2), pp. 560–574. Cited by: Non-Cartesian k-space acquisition.
-  (2010) Model-based image reconstruction for mri. IEEE Signal Processing Magazine 27 (4), pp. 81–89. Cited by: Discussion.
-  (2016-20–22 Jun) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, pp. 1050–1059. External Links: Cited by: Image reconstruction by MAP.
-  (2013) Deep autoregressive networks. arXiv preprint arXiv:1310.8499. Cited by: Prior model for MR images, Image reconstruction by MAP.
-  (2002) Generalized autocalibrating partially parallel acquisitions (grappa). Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 47 (6), pp. 1202–1210. Cited by: Parallel imaging and or regularization driven reconstruction.
-  Improving variational inference with inverse autoregressive flow.(nips), 2016. URL http://arxiv. org/abs/1606.04934. Cited by: Prior model for MR images.
-  (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. Cited by: §1, §1.
-  (2008) A fast method for designing time-optimal gradient waveforms for arbitrary k-space trajectories. IEEE transactions on medical imaging 27 (6), pp. 866–873. Cited by: Non-Cartesian k-space acquisition.
-  (2018) Deep generative adversarial neural networks for compressive sensing mri. IEEE transactions on medical imaging 38 (1), pp. 167–179. Cited by: §1, §1, Discussion.
-  (2016) . arXiv preprint arXiv:1601.06759. Cited by: §1, Prior model for MR images, Figure 1.
-  (2001) Advances in sensitivity encoding with arbitrary k-space trajectories. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 46 (4), pp. 638–651. Cited by: Non-Cartesian k-space acquisition, Discussion.
-  (2010) MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE transactions on medical imaging 30 (5), pp. 1028–1041. Cited by: §1.
-  (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082. Cited by: §1.
-  (2017) Pixelcnn++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517. Cited by: §1, §1, MRI reconstruction using Bayes theorem, Prior model for MR images, Prior model for MR images, Theory, Figure 1.
A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging 37 (2), pp. 491–503. Cited by: §1, §1, Discussion.
ESPIRiT—an eigenvalue approach to autocalibrating parallel mri: where sense meets grappa. Magnetic resonance in medicine 71 (3), pp. 990–1001. Cited by: §1.
-  (2016) Deep admm-net for compressive sensing mri. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 10–18. External Links: Cited by: §1, §1, Discussion.
-  (2018) FastMRI: an open dataset and benchmarks for accelerated MRI. CoRR abs/1811.08839. External Links: Cited by: MRI data and pre-processing.