1 Introduction
Acquisition time in magnetic resonance (MR) imaging is directly related to the number of samples acquired in kspace. For high quality images, a large number of samples, and therefore long acquisition times are necessary. Reducing acquisition time in a reliable manner is an important question in MR imaging and many methods exploiting different properties of the kspace [1] and hardware design [2] have found wide use in clinical practice.
On the other hand, a substantial amount of effort went into the investigation of reconstruction methods from randomly or regularly undersampled kspace acquisitions. The random undersampling approach was primarily motivated by the compressed sensing framework where the incoherence between kspace sampling and some sparsifying transform was exploited to achieve theoretically exact reconstruction [3, 4, 5, 6, 7]. Similarly, regular undersampling schemes and corresponding reconstruction algorithms were extensively investigated [8, 9, 10, 11]. The common aspect of these two approaches is that they both are interested in inverting an underdetermined system of equations by finding good regularizers to overcome the illposedness. A key observation is that the use of regularizers in both approaches can essentially be viewed as introduction of prior information to the reconstruction problem.
Recently, researchers started to employ deep neural networks (DNN) [12]
for MR reconstruction inspired by its success in computer vision and medical image analysis
[13]. In the related literature there are two main approaches for applying DNNs to the reconstruction problem. The first is to teach the networks a mapping going from the undersampled images to their fully sampled versions [14]. After training, an undersampled image is fed into the network to obtain its reconstruction at the output. This approach does not have any explicit data consistency term and cannot guarantee this property at test time. To overcome this limitation, in [15]the authors introduce a deep cascaded network. Here a block of convolutional neural network (CNN) is followed by a data consistency term and this structure is repeated multiple times. This guarantees consistency with measured data also at test time while the CNNs perform dealiasing.
The second approach is to use the efficient trainability of the networks to improve existing algorithms. In [16] the authors show that the iterations of the alternating direction method of multipliers (ADMM) algorithm solving a dictionary based 1 regularized reconstruction problem can be expressed as a multilayer network. Kernels and nonlinear functions are parameterized and learned by the network which improve on the fixed version in the original method. Both deep learning approaches suffer from specificity of training. Learned mappings, kernels and nonlinearities are specific to a certain undersampling pattern and possibly fieldofview (FOV). Hence, the networks need to be retrained for each different configuration of FOV, undersampling factor and pattern, which limits wide use of these methods.
In computer vision, DNNs have also been used for approximating high dimensional probability distributions from samples through unsupervised learning. Such approximations allow to estimate the likelihood of a previously unseen data point. One such approach is the variational auto encoder (VAE) algorithm
[17, 18]. Using VAEs, it is possible to approximate the distribution of patches of MR images and compute likelihood of a previously unseen image. Furthermore, the likelihood function is differentiable since it is modelled as a neural network.In this work we develop a probabilistic reconstruction model that uses priors learned via VAEs, and deploys them as regularizers, which we term deep density prior (DDP) based reconstruction. The approach can be used to learn the prior distribution using fully sampled examples once and reconstruct for different sampling parameters without retraining. To achieve this we first formulate a Bayesian model of the imaging process and express the reconstruction problem as a MaximumAPosteriori (MAP) estimation problem. We show how some of the previous methods fit into this formulation and then present the proposed model using a VAElearned prior. In the experiments, we first demonstrate that VAEs can approximate distribution of patches of MR images by showing examples of patches sampled from the prior. We then show reconstruction results from the proposed model and compare with both more conventional approaches as well as recent DNN based methods.
2 Methods
In the first two parts of this section, we provide a brief background on Bayesian formulation of the MR reconstruction problem and the VAE algorithm. Then, starting from Section 2.3, we present our main technical contribution, that is learning a prior for MR image patches and integrating it in the reconstruction problem.
2.1 Bayesian formulation of the MR reconstruction problem
The MR image is denoted as , where N is number of voxels^{2}^{2}2In this work we focus on 2D imaging, however, the same techniques can be applied to 3D imaging and this extension will be a part of our future work.. The imaging operation is given by the undersampling encoding operation , where is the sensitivity encoding operator. Here, is number of coils, is the Fourier operator and is the undersampling operator, with .
Assuming complex valued zero mean normal distributed noise, denoted as
, the acquired data can be written as . Under this noise model the data likelihood becomes(1) 
where H denotes the Hermitian transpose and
is the standard deviation of the noise. In reconstruction, we are interested in the posterior distribution
, i.e. the probability of the image being mgiven the kspace measurements, which can be written using the Bayes’ theorem as
(2) 
The common approach to model the reconstruction problem, which we will also use, is to use the MAP estimation
(3) 
where the equality is due to not depending on m. is called the prior term and represents the information one has about the fully sampled image before the data acquisition. Taking the of both sides and plugging in the definition of the likelihood term yields
(4)  
(5) 
Rewriting the likelihood and ignoring terms that do not vary with m yields
(6) 
where we define . Taking the maximum, or equivalently taking the minimum of the negative of the expression and multiplying all sides with recovers the conventional formulation of a reconstruction problem with a data and a regularization term
(7) 
From the point of view of Equation 7, regularizationbased reconstruction algorithms differ in the log prior term they use. A generic prior expression can be written as , where is a sparsifying operator, is the expected image and is the p norm.
For instance, forcing the image to be sparse in some domain can be expressed as assuming the data is i.i.d. according to the Laplace distribution in the domain [19, 20]. Mathematically, this is
(8) 
where we set , denote the matrix determinant with ^{3}^{3}3Where we used the change of variables rule to relate the distributions [21]., use as the magnitude and i indexes voxels in the image. Now, taking of the expression and ignoring the terms that do not vary with m we get
(9) 
Setting yields the prior term in Equation 7. Furthermore, setting as the gradient operator with first order finite differences, denoted by , corresponds to enforcing sparsity in the gradient domain recovering the basic total variation compressed sensing reconstruction
Alternatively, both and can be estimated from data. One example for this approach is to assume a Normal prior on m, and estimate the mean and the covariance from a lowresolution image of the same object [10, 22]. The posterior is then also a Normal distribution due to conjugacy relations and the MAP estimate is given with the posterior mean [23]
A similar idea has been applied in kt SENSE and kt PCA methods in the dynamic imaging setting to exploit spatiotemporal correlations [22, 10].
In this work, we follow the same approach of estimating the prior term from example images and propose to approximate with a neural network model. We train a VAE on patches of fully sampled MR images to capture the distribution of patches and use this prior for reconstruction.
2.2 Learning the data distribution with VAEs
VAE is an unsupervised learning algorithm used to approximate highdimensional data distributions
[17, 18]. We introduce VAEs very briefly and refer the reader to [17] for further discussion.The main goal of the VAE algorithm is to approximate the data distribution using a latent model given as
(10) 
where denote the latent variables, the data and the prior over the z’s. Direct computation of , or requires integrating over z, which is not feasible even with relatively small . Variational approximation uses an approximate distribution for the posterior to address this problem. Using the approximation, can be decomposed into two terms [24]
(11) 
The first term is referred to as the evidence lower bound (ELBO) and the second term is the KullbackLeibler divergence (KLD) between the approximate and true posteriors.
VAE parameterizes the data likelihood and the approximate posterior using networks, i.e. and , and during training maximizes ELBO for a given set of data samples with respect to the parameters of the networks and . Maximizing ELBO will minimize the KLD and maximize . The optimization during training is written as
(12) 
where is the training sample.
The networks and are called the encoder and the decoder, respectively. The former takes the data sample x and encodes it into a posterior distribution in the latent space with network parameters . If the posterior distribution is modelled as a Gaussian, then the encoder outputs a mean and a covariance matrix for z depending on x
. The decoder network on the other hand, takes a latent vector
z and maps it to a conditional distribution of the data given z.In this work, we use the vanilla VAE design [17] except for the data likelihood, where we use a multimodal Gaussian , similar to [25].
In our model, we use the VAE model to approximate the prior distribution of magnitude of 2D MR patches of fully sampled images. In other words, we approximate , where x is a patch extracted from a complex valued MR image of size x. We provide the exact parameters of the network design in Section 2.5.
2.3 Deep density prior reconstruction model
Once the network is trained and optimal values and are found, we can integrate the prior within a Bayesian formulation of the reconstruction problem. We make two key observations to achieve this. First, an approximate log likelihood of an image patch x can be obtained by evaluating with the optimal parameters.
(13) 
where we use the magnitude of the patch within the ELBO. This allows us to formulate the proposed model as the following MAP estimation problem
(14) 
where denotes a set of (overlapping) patches covering the image m and is the magnitude of the image patch. Note that this approach assumes independence between different patches and therefore, ignores statistical dependencies between them. However, it also makes the computations much easier.
Since an exact computation of the ELBO term requires evaluating the expectation with respect to , which is computationally not feasible, we use a Monte Carlo sampling approach to calculate the ELBO as follows
(15) 
Here represents the number of MonteCarlo samples.
Our second key observation is that the approximation in Equation 15 is differentiable since each term is defined through networks that are themselves differentiable. This is the critical aspect that allows integrating the trained VAE as a prior into an optimization based reconstruction algorithm. Plugging the ELBO approximation into Equation 14, we obtain the formulation of the DDP reconstruction problem
(16) 
where the first term is the usual data term and the second term within the summation is the regularization term that arises from the learned prior.
We can compute the total derivative of the prior term with respect to each image patch as follows
(17)  
(18)  
(19) 
where we defined for notational simplicity. The second term in the last line is due to the dependency of the samples to x. The term is due to taking the derivative of the magnitude with respect to the image patch.
2.4 Optimization using projection onto convex sets
We solve the DDP optimization problem given in Equation 16 using the projection onto convex sets (POCS) algorithm [26], specifically using the formulation in [27]. POCS is an iterative minimization process, where the solution variable is projected sequentially onto different convex sets, each defined by one of the constraints in the problem.
The projection for the data consistency term is given simply as [26]. Since we do not have a projection operator for the prior term, we approximate it by several gradient ascent steps with a small step size. We use the final image at the end of the ascent steps as the projected image patch. We define the operation as follows: i) we create a set of patches from the image at iteration t, ii) obtain the derivatives for each of these patches using Equation 19, which have the same size as the patches themselves, iii) create a derivative image from these derivative patches by averaging the values where the patches overlap, iv) update the image using the derivative image, v) repeat this K times. To reduce edge effects resulting from patchwise projections, we use four sets of overlapping patches.
As the likelihood term is only trained on magnitude patches and cannot convey any information regarding the phase, we also use a phase projection following the prior projection, where we set the phase of the image to zero, given as . Notice we do this because we expect the images to have zero phase in this specific application. It is however possible to change this to any other constraint on the phase of the image, such as a low resolution estimation of the phase [3] or zerodivergence constraint for phase contrast flow imaging reconstruction [28].
With the data consistency, prior and phase projections defined as above, one step of reconstruction within the POCS framework becomes
(20) 
We apply POCS steps to complete the reconstruction. Algorithm 1 provides a summary of the reconstruction procedure.
2.5 Details on VAE network architecture and training
We use batches of 50 patches of size 28x28 and a 60 dimensional latent space. Both networks are depicted in Figure 1. For the encoding network, the input is an image patch and the output is the mean and the covariance of the corresponding posterior. The decoding network takes a vector z as input and outputs the mean and the covariance of the corresponding likelihood for the patch. Both networks are mostly convolutional with single fully connected layers. All convolutional layers for both networks use 3x3 kernels and all layers also have additive bias terms throughout the network.
Due to stability issues we use the log of the variance values throughout the network. We initialize the network weights with a truncated normal initializer with standard deviation 0.05. We use Adam
[29]for optimization (learning rate of 5e4, default momentum values in Tensorflow).
3 Experimental setup
3.1 MR image data
We evaluated our method using structural images from the Human Connectome Project (HCP)^{4}^{4}4Further details of the HCP data set can be found in HCPS500+MEG2ReleaseReferenceManual and HCPS1200ReleaseAppendixI (obtainable from https://www.humanconnectome.org/study/hcpyoungadult/document/500subjectsdatarelease) data set [30]. High quality and large number of HCP images are ideal for learning priors with the VAE model.
We took 2D slices from the T1 weighted 3D MPRAGE images acquired with 2400 ms, 2.14 ms and 1000 ms for TR, TE and TI, respectively, flip angle of 8 degrees, a standard field of view for all subjects of 224x224x224 mm with 0.7 mm isotropic resolution using a bandwidth of 210 Hz/voxel. Images were acquired with a 3T Siemens device. Fat suppression was used during acquisition. In our experiments, we used the minimally preprocessed images in order to also have the corresponding FreeSurfer [31] segmentations, which we use in our evaluation. Preprocessing steps were resampling to 1mm isotropic resolution with FOV matrix 260x311x260 and rigid alignment. Despite the rigid alignment, there were substantial local orientation differences between the images from different subjects. We used five central slices with skipping four slices between each. We cropped the images to a size of 252x308, removing only background, to reduce computational load. We normalized the images by mapping their 99 percentile to 1.
We used images from 158 subjects (790 images in total) to train the prior VAE model. The VAE model trained for 200k iterations. As test data, we took central slices from images of 17 separate subjects.
To test if the proposed reconstruction method can be used on a domain that is different from the one the prior is trained on, we experimented with two slices from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set^{5}^{5}5Two of the images used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For uptodate information, see www.adniinfo.org. The images were selected from subjects with Alzheimer’s disease and who have visible white matter lesions to also test whether the proposed method will be able to faithfully reconstruct lesions. The acquisition parameters of the ADNI images were different than HCP: TR, TE, TI values were 7.34 ms, 3.03 ms, 400 ms, flip angle was 11 degrees, FOV matrix was 196x256x256 with resolution 1.2 mm x 1 mm x 1 mm. The images were acquired with a 3T GE scanner with no fat suppression and were biasfield corrected with the N3 algorithm [32]. We extracted the central slices that showed the largest lesions from these images and further cropped the FOV to 168x224 to get rid of the empty regions in the images to accelerate computations. We normalized the image intensities the same way as the HCP data.
3.2 Setup and evaluation
In our assessment, we artificially undersampled the test images in kspace, reconstructed them back and compared the results with the original images. We experimented with varying undersampling ratios, which we denote with R when presenting results.
We used Cartesian undersampling in one dimension while fully sampling in the other dimension, corresponding to phaseencoding and readout directions, respectively. We present some examples of undersampling patterns in Figures 4 and 4
. We generated the patterns by randomly sampling a 1D Gaussian distribution along the phaseencoding dimension. We randomly drew many sampling patterns from the Gaussian distribution and selected the ones with the best peaktoside ratio. In addition, we added the central 15 profiles to these selected patterns to fully sample the lowfrequency components. We used 2, 3, 4 and 5 for net undersampling ratios (including the fully sampled center). We assumed a single coil imaging with uniform sensitivity throughout the experiments for proof of concept, as the proposed method works independent of the number of coils.
We tried different configurations of latent space size and patch size to see how sensitive the method is to changes in these parameters. We used patch sizes of 12x12, 20x20, 28x28, and 36x36 and latent space dimensions of 40, 60 and 80. For each configuration we retrained the prior model. We also analyzed the proposed method’s robustness to noise. To this end, we added synthetic noise to the fully sampled images before undersampling. The original images had a SNR between 50 to 70, depending on the subject. We added 4 different levels of noise with SNR values 40, 30, 20 and 10. Notice the actual noise on the images is higher (SNR nearly half) due to the base noise on the original images. We studied the different configurations and robustness to noise using 5 of the test images.
While assessing the DDP method, we generated a new undersampling pattern for each test to make sure the reconstruction results covered the variability of the undersampling patterns. We reconstructed the test images using 30 POCS iterations (T=30) for R=2 and 3 and 60 for R=4 and 5, 10 iterations for the prior projection (K=10) and the step size was set to =1e4.
We compared the reconstruction with the fully sampled ”ground truth” images using RootMeanSquaredError (RMSE), ContrasttoNoiseRatio (CNR) and Contrast difference (CN) computed at the gray and white matter boundary in the brain. In order to obtain white matter boundaries we applied binary erosion (with a square structuring element of size 7x7) to the white matter segmentations, as computed by FreeSurfer, and took the difference of the original and eroded segmentations.
3.3 Compared methods
As a basic benchmark, we evaluated zerofilling reconstructions. As the first baseline, we used total variation (TV) reconstruction as described in [3]. We used the BART toolbox implementation that is publicly available [33]. We used the ”pics” tool with TV regularization (regularization strength 0.075) and the ADMM parameter as 1. We used 20 conjugate gradient steps and 4500 total iterations (”bart pics R T:3:0:0.0075 u1 C20 i4500”). We ran a grid search for the best parameter setting in the RMSE sense, which we chose as above.
As the second method, we used reconstruction using dictionary learning (DL) [34]^{6}^{6}6Code available at http://www.ifp.illinois.edu/ yoram/DLMRILab/Documentation.html
. We used 200 iterations, a patch size of 36 voxels and 36 dictionary atoms. Furthermore, we set number of signals used for training to 7200 and the overlap stride to 2. We used KSVD learning with both sparsity and error threshold. The sparsity level was set to 7. The error threshold was set to 0.046 for the first four iterations, then to 0.0322 for the rest of the iterations. We used 15 KSVD iterations. We choe the parameters as sugested by the authors in code, but increased the number of iterations.
We employed the recently proposed ADMMNet [16] ^{7}^{7}7https://github.com/yangyan92/DeepADMMNet
. We modified the code very slightly to work with nonsquare images and Cartesian undersampling patterns. To train the model, we used the same 790 images, which we used to train the VAE model. To train and test the ADMMNet, we fixed the undersampling pattern for all undersampling ratios, as required by the method. We used 15 stages with 8 filters (filter size 3x3), padding as 1. We did not use weight decay during training. We set the maximum iteration number to 25 for the LBFGS algorithm. It trained to maximum iterations for R=2 (45 hours), 13 iterations before convergence for R=3 (42 hours), 18 iterations before running out of time (120 hours) for R=4 and maximum iterations for R=5 on a GPU (GeForce GTX TITAN X). The normalized mean squared errors were 0.078 and 0.035 for R=2, 0.11 and 0.071 for R=3, 0.15 and 0.11 for R=4, and 0.19 and 0.13 for R=5, before and after training, respectively. We took the parameter setting for which the best results were reported in the paper.
We also compared with the methods shiftinvariant discrete wavelet reconstruction (SIDWT) [35], fast multiclass dictionary learning (FDLCP) [36] and patch based directional wavelets (PBDW) [37]. For these methods we cropped the images to 256x256 and generated new undersampling patterns for this size since the authors implementations worked only with that size^{8}^{8}8Implementations can be found on http://csrc.xmu.edu.cn/csg_publications_en.html. We did not modify the code for these methods and took the parameters them as set by the authors in the code. For comparison, we ran our proposed method on these images as well.
Finally we compared to an algorithm based on BM3D denoising, namely BM3DMRI [38]^{9}^{9}9Code available at http://web.itu.edu.tr/eksioglue/pubs/BM3D_MRI.htm and used the parameters as set by the author in the code.
4 Results
We start by showing patches sampled from the prior model trained for patchsize of 28x28 and latent dimension of 60 in Figure 2. These patches were generated by simply feeding 16 random z vectors drawn from unit Gaussian to the decoder network. The decoder network outputs the mean images, i.e. , and the corresponding variance images, i.e. . We would like to note that we have not cherrypicked these images. The sampled mean patches, shown on the left, look realistic where gray matter, white matter and gyri/sulci structures are clearly defined and properly positioned. These generated samples suggest that the VAE algorithm is able to approximate the underlying distribution of MR patches. Variance images images on the right show that VAE places high variance in the CSF areas as well as boundaries between structures. We observe that isolated gray matter or CSF islands receive high variance.
Next, we show visual reconstruction results. Figure 4 shows results for one of the test images from the HCP data for R=2. The sampling pattern is also shown in the figure. All the methods achieve good reconstruction quality in terms of reproducing structural and texture information, except the TV images which are cartoonlike. The error images show higher error in regions where gray matter and white matter structures are intertwined.
Aliasing artifacts become more prominent at R=3 as seen in Figure 4. SIDWT cannot complete dealiasing, TV and PBDW are have structural and texture problems. ADMMNet cannot reconstruct the small GM island in the zoomed image. DDP, DLMRI and FDLCP perform well.
We show four more randomly selected images from the test set in supplementary Figures S4 and S5. We also show results for higher undersampling ratios in supplementary Figures MR image reconstruction using deep density priors and MR image reconstruction using deep density priors.
We present the quantitative results for reconstruction accuracy in Table 1. In the first part, i.e. for the full FOV reconstructions BM3DMRI performs the best for R=2 and 3, followed by the DDP. After R=4 the DDP method works best. For the cropped FOV part, The FDLCP reconstruction outperforms all the other methods in all cases in terms of RMSE with the proposed method following as the 2nd. In terms of CNR, the proposed method performs better than all the others.
A plot showing convergence of the POCS algorithm can be seen in Figure S1. Results for patch size and latent dimension can be seen in supplementary Figure S2.
Lastly, we show DDP reconstructions in Figure 5 for the ADNI images for R=2. We used the VAE model was trained on HCP images, which is composed of healthy subjects, to reconstruct the images here. The reconstructed images recover gray matterwhite matter structures and edges faithfully. The white matter lesions are also well reconstructed. The error maps do not indicate a specific increase of error in the lesion regions.
Figure 3: Reconstruction results for R=2. First row shows the fully sampled image (FS), the undersampled image (US) and the results of different reconstruction methods. Second row shows the used undersampling pattern and the error maps (intensities clipped to (0.3, 0.3)). Third row shows a zoomed in part of the images. US, DDP, TV, DLMRI, ADMMNet and BM3DMRI images are produced with the undersampling pattern that was used to train the ADMMNet for comparability. In the second part DDP, SIDWT, FDLCP and PBDW are reconstructed for a smaller FOV. 
Figure 4: Similar display as in Figure 4 with R=3. 
5 Discussion
MR patches sampled from the learned distribution shown in Figure 2 support our hypothesis that the VAE model can learn to approximate the distribution of MR patches. The variance maps show that the sulcilike generated structures filled with CSF have higher variance, which is in accordance with the literature [39].
The reconstruction examples in Figure 4 and Figure 4 show that our proposed VAE based deep density prior reconstruction method produces promising results. This suggests that the learned distribution of MR patches can be used as prior for reconstruction. Reconstructions with R=3 shows limitations of the methods.
In the full FOV case, the proposed method performs better than most other methods in terms of quality metrics except for the BM3DMRI and better or equally in visual quality assessment. The BM3D based method performs worse than the DDP in higher R’s since denoising becomes more difficult.
For the cropped FOV FDLCP performs consistently better, which is a highly optimized method. The results we report here are for proof of concept and are open to improvement. Compared to most other methods learning the data distribution and using this as a prior works better than enforcing a specific distribution in a specific domain, such as the Laplace distribution in the gradient domain as in TV or more generalized distributions in a kernel domain as in DL. These methods are inherently limited by their formulations compared to the more general formulation of the DDP approach presented here. Hence the DDP approach can be expected to improve over these methods further.
Two of the limitations of our method are the variational assumption and the unit Gaussian prior for the latent space. A lot of work went into investigating different latent space prior structures such as Gaussian mixtures or graphical models which could in principal improve the encoding and representative capacity of the latent variables to address the latter limitation [40, 41]. Similarly a lot of research is going into different density estimation methods, for instance using generative adversarial networks [42]. These have the potential to improve on the approximation of the true likelihood and hence to improve reconstruction quality as well.
Another limitation of the proposed method is the independence assumption over patches. This assumption can be removed by taking into account the statistical dependence between different patches in an image. However, this is likely to increase the computational complexity of the reconstruction method.
Experiments with different configurations show that the DDP reconstruction is not sensitive to patch size and latent space dimensions for a reasonable range. Furthermore, as expected the performance degrades with decreasing SNR.
We emphasize the generality of our method in contrast to most DNN based methods in the literature. Most methods require a specific mapping to be learned for every undersampling pattern and/or lack an explicit data consistency term, whereas our method does not suffer from these limitations. Our method learns the prior directly on images and can be used to reconstruct any sampling scheme faithful to the measured data without need of retraining as long as the images are from the same domain.
The ADNI reconstruction results are also encouraging in two terms. First, they show that the learned model does not blur out the lesion regions during the prior projection, which could be expected since the training images do not have any examples of lesions. However, for a proper treatment of images with lesions, we believe the training data set should include images with lesions.
Secondly, the proposed method performs reasonably despite the domain difference between the training and test sets. Although the two data sets are acquired at the same field strength, they still differ in the acquisition protocol and imaging parameters. Furthermore, our method is invariant to FOV changes but not to scale changes. HCP and ADNI voxel sizes are different (1mm x 1mm vs. 1.2 mm x 1 mm), however the reconstructions still look plausible. Notice that the HCP has fat suppression which reduces the signal from the skull significantly, however ADNI images do not have the suppression, making the dealiasing more challenging. The results indicate that the learned model can generalize to different scales and imaging protocols without retraining. More challenging problems which might be attacked by domain adaptation methods such as training and testing on different imaging modalities still require further research.
R=2  R=3  R=4  R=5  
RMSE  CNR  CN  RMSE  CNR  CN  RMSE  CNR  CN  RMSE  CNR  CN  
FS    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02) 
Zerofill  13.03(1.13)  0.40(0.09)  0.12(0.02)  21.15(1.36)  0.33(0.07)  0.09(0.02)  24.92(1.91)  0.31(0.06)  0.08(0.02)  27.36(1.79)  0.30(0.06)  0.08(0.02) 
DDP  2.64(0.39)  0.48(0.11)  0.12(0.02)  4.14(0.50)  0.48(0.11)  0.12(0.02)  6.36(1.69)  0.47(0.11)  0.11(0.02)  10.00(2.39)  0.42(0.11)  0.10(0.02) 
TV  3.87(0.47)  0.46(0.11)  0.12(0.02)  7.56(0.83)  0.40(0.10)  0.09(0.02)  11.40(1.39)  0.35(0.09)  0.08(0.02)  14.56(1.23)  0.31(0.08)  0.07(0.02) 
DLMRI  4.48(0.52)  0.46(0.11)  0.12(0.02)  7.25(0.83)  0.40(0.10)  0.10(0.02)  10.72(1.31)  0.33(0.09)  0.08(0.02)  13.87(1.25)  0.30(0.08)  0.07(0.02) 
ADMMNet  3.65(0.39)  0.48(0.11)  0.12(0.02)  7.41(0.46)  0.44(0.11)  0.11(0.02)  11.80(0.69)  0.36(0.09)  0.09(0.02)  13.54(0.70)  0.32(0.08)  0.08(0.02) 
BM3DMRI  1.92(0.36)  0.48(0.10)  0.12(0.02)  4.23(1.05)  0.46(0.10)  0.11(0.02)  8.08(2.48)  0.43(0.10)  0.10(0.02)  11.70(2.76)  0.38(0.09)  0.09(0.02) 
FS    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02)    0.48(0.10)  0.12(0.02) 
DDP  2.73(0.38)  0.48(0.10)  0.12(0.02)  4.67(0.92)  0.48(0.10)  0.12(0.02)  7.28(1.27)  0.46(0.10)  0.11(0.02)  12.72(2.66)  0.39(0.08)  0.10(0.02) 
SIDWT  4.49(0.98)  0.45(0.11)  0.12(0.02)  9.42(1.62)  0.39(0.09)  0.12(0.02)  14.57(1.96)  0.33(0.08)  0.10(0.02)  18.76(2.80)  0.32(0.07)  0.10(0.02) 
FDLCP  2.63(0.35)  0.48(0.10)  0.12(0.02)  4.35(0.87)  0.45(0.10)  0.14(0.03)  6.72(0.89)  0.41(0.10)  0.13(0.03)  9.62(1.48)  0.35(0.08)  0.11(0.02) 
PBDW  3.24(0.38)  0.47(0.11)  0.12(0.02)  5.59(0.94)  0.44(0.10)  0.13(0.02)  8.51(0.98)  0.38(0.09)  0.12(0.02)  11.38(1.39)  0.34(0.08)  0.10(0.02) 
R  net undersampling ratio, RMSE  root mean squared error in percentage, CNR  contrast to noise ratio, CN  contrast.
6 Conclusion
In this paper we proposed a novel method, termed deep density prior for MR reconstruction from undersampled kspace acquisitions. The method uses the VAE algorithm to learn the distribution of MR patches from a training set composed of fully sampled images. The model then uses this learned distribution as a probabilistic prior in a Bayesian reconstruction framework. We have shown that the VAE can approximate the distribution and generate realistic looking MR patches. Furthermore, reconstruction with the DDP approach yielded promising results for HCP and ADNI data sets in terms of visual quality and quantitative measures.
Acknowledgements
Authors would like to acknowledge Prof. Klaas Pruessmann and Prof. Sebastian Kozerke for valuable discussions. We thank NVIDIA for their GPU donation. The presented work is funded by Swiss National Science Foundation grant number: 205321_173016. For the ADNI data usage statement see the footnote^{10}^{10}10Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California..
References
 [1] D. A. Feinberg, J. D. Hale, J. C. Watts, L. Kaufman, and A. Mark, “Halving mr imaging time by conjugation: demonstration at 3.5 kg.,” Radiology, vol. 161, no. 2, pp. 527–531, 1986. PMID: 3763926.
 [2] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “Sense: Sensitivity encoding for fast mri,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999.
 [3] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
 [4] M. Guerquinkern, M. Häberlin, K. P. Pruessmann, and M. Unser, “A fast waveletbased reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, pp. 1649–1660, 2011.
 [5] S. G. Lingala, Y. Hu, E. Dibella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and lowrank structure: Kt SLR,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1042–1054, 2011.
 [6] D. L. Donoho, “Compressed sensing,” IEEE Transactions in Information Theory, vol. 52, pp. 1289–1306, Apr. 2006.
 [7] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
 [8] L. Wissmann, A. Gotschy, C. Santelli, K. C. Tezcan, S. Hamada, R. Manka, and S. Kozerke, “Analysis of spatiotemporal fidelity in quantitative 3d firstpass perfusion cardiovascular magnetic resonance,” Journal of Cardiovascular Magnetic Resonance, vol. 19, p. 11, Jan 2017.
 [9] K. P. Pruessmann, M. Weiger, P. Börnert, and P. Boesiger, “Advances in sensitivity encoding with arbitrary kspace trajectories,” Magnetic Resonance in Medicine, vol. 46, no. 4, pp. 638–651, 2001.

[10]
H. Pedersen, S. Kozerke, S. Ringgaard, K. Nehrke, and Y. K. Won, “Kt PCA: Temporally constrained kt BLAST reconstruction using principal component analysis,”
Magnetic Resonance in Medicine, vol. 62, no. 3, pp. 706–716, 2009.  [11] J. F. M. Schmidt, L. Wissmann, R. Manka, and S. Kozerke, “Iterative kt principal component analysis with nonrigid motion correction for dynamic threedimensional cardiac perfusion imaging,” Magnetic Resonance in Medicine, vol. 72, no. 1, pp. 68–79, 2014.
 [12] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning.,” Nature, vol. 521, pp. 436–444, May 2015.
 [13] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, no. Supplement C, pp. 60 – 88, 2017.
 [14] B. Zhu, J. Z. Liu, B. R. Rosen, and M. S. Rosen, “Neural network mr image reconstruction with automap: Automated transform by manifold approximation,” in Proceeding of the 25th Annual Meeting of ISMRM, ISMRM, 2017.
 [15] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction, pp. 647–658. Cham: Springer International Publishing, 2017.
 [16] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admmnet for compressive sensing mri,” in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 10–18, Curran Associates, Inc., 2016.
 [17] D. P. Kingma and M. Welling, “Autoencoding variational bayes.,” CoRR, vol. abs/1312.6114, 2013.

[18]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in
Proceedings of the 31st International Conference on Machine Learning (E. P. Xing and T. Jebara, eds.), vol. 32 of Proceedings of Machine Learning Research, (Bejing, China), pp. 1278–1286, PMLR, 22–24 Jun 2014.  [19] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using laplace priors,” Transactions in Imgage Processing, vol. 19, pp. 53–63, Jan. 2010.
 [20] D. P. Wipf and B. D. Rao, “Sparse bayesian learning for basis selection,” IEEE Transactions on Signal Processing, vol. 52, pp. 2153–2164, Aug 2004.
 [21] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
 [22] J. Tsao, P. Boesiger, and K. P. Pruessmann, “kt blast and kt sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations,” Magnetic Resonance in Medicine, vol. 50, no. 5, pp. 1031–1042, 2003.
 [23] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Science). Chapman and Hall/CRC, 2 ed., July 2003.
 [24] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
 [25] S. Yeung, A. Kannan, Y. Dauphin, and L. FeiFei, “Tackling overpruning in variational autoencoders,” CoRR, vol. abs/1706.03643, 2017.
 [26] A. A. Samsonov, E. G. Kholmovski, D. L. Parker, and C. R. Johnson, “Pocsense: Pocsbased reconstruction for sensitivity encoded magnetic resonance imaging,” Magnetic Resonance in Medicine, vol. 52, no. 6, pp. 1397–1406, 2004.
 [27] A. R. De Pierro and E. S. Helou Neto, “From convex feasibility to convex constrained optimization using block action projection methods and underrelaxation,” International Transactions in Operational Research, vol. 16, no. 4, pp. 495–504, 2009.
 [28] C. Santelli, M. Loecher, J. Busch, O. Wieben, T. Schaeffter, and S. Kozerke, “Accelerating 4d flow mri by exploiting vector field divergence regularization,” Magnetic Resonance in Medicine, vol. 75, no. 1, pp. 115–125, 2016.
 [29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
 [30] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, D. C. V. Essen, and M. Jenkinson, “The minimal preprocessing pipelines for the human connectome project,” NeuroImage, vol. 80, no. Supplement C, pp. 105 – 124, 2013. Mapping the Connectome.
 [31] M. Reuter, N. J. Schmansky, H. D. Rosas, and B. Fischl, “Withinsubject template estimation for unbiased longitudinal image analysis,” NeuroImage, vol. 61, no. 4, pp. 1402 – 1418, 2012.
 [32] J. Sled, A. Zijdenbos, and A. Evans, “A nonparametric method for automatic correction of intensity nonuniformity in mri data,” IEEE Transactions in Medical Imaging, vol. 17, pp. 87–97, 01 2002.
 [33] M. Uecker, J. I. Tamir, F. Ong, S. Iyer, J. Y. Cheng, and M. Lustig, “Bart: version 0.4.00,” Dec. 2016.
 [34] S. Ravishankar and Y. Bresler, “Mr image reconstruction from highly undersampled kspace data by dictionary learning,” IEEE transactions on medical imaging, vol. 30, pp. 1028–41, 11 2010.
 [35] B. Ning, X. Qu, D. Guo, C. Hu, and Z. Chen, “Magnetic resonance image reconstruction using trained geometric directions in 2d redundant wavelets domain and nonconvex optimization,” Magnetic Resonance Imaging, vol. 31, no. 9, pp. 1611 – 1622, 2013.
 [36] Z. Zhan, J. F. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu, “Fast multiclass dictionaries learning with geometrical directions in mri reconstruction,” IEEE Transactions on Biomedical Engineering, vol. 63, pp. 1850–1861, Sept 2016.
 [37] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen, “Undersampled mri reconstruction with patchbased directional wavelets,” Magnetic Resonance Imaging, vol. 30, no. 7, pp. 964 – 977, 2012.
 [38] E. M. Eksioglu, “Decoupled algorithm for mri reconstruction using nonlocal block matching model: Bm3dmri,” Journal of Mathematical Imaging and Vision, vol. 56, pp. 430–440, Nov 2016.

[39]
R. Tanno, D. E. Worrall, A. Gosh, E. Kaden, S. N. Sotiropoulos, A. Criminisi, and D. Alexander, “Bayesian image quality transfer with cnns: Exploring uncertainty in dmri superresolution,” in
Intl Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), Springer, September 2017.  [40] N. Dilokthanakul, P. A. M. Mediano, M. Garnelo, M. C. H. Lee, H. Salimbeni, K. Arulkumaran, and M. Shanahan, “Deep unsupervised clustering with gaussian mixture variational autoencoders,” CoRR, vol. abs/1611.02648, 2016.
 [41] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 2946–2954, Curran Associates, Inc., 2016.
 [42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, pp. 214–223, 2017.