MR image reconstruction using the learned data distribution as prior

by   Kerem C. Tezcan, et al.
ETH Zurich

MR image reconstruction from undersampled data exploits priors to compensate for missing k-space data. This has previously been achieved by using regularization methods, such as TV and wavelets, or data adaptive methods, such as dictionary learning. We propose to explicitly learn the probability distribution of MR image patches and to constrain patches to have a high probability according to this distribution in reconstruction, effectively employing it as the prior. We use variational autoencoders (VAE) to learn the distribution of MR image patches. This high dimensional distribution is modelled by a latent parameter model of lower dimensions in a non-linear fashion. We develop a reconstruction algorithm that uses the learned prior in a Maximum-A-Posteriori estimation formulation. We evaluate the proposed method with T1 weighted images and compare it to existing alternatives. We also apply our method on images with white matter lesions. Visual evaluation of the samples drawn from the learned model showed that the VAE algorithm was able to approximate the distribution of MR image patches. Furthermore, the reconstruction algorithm using the approximate distribution produced qualitatively better results. The proposed technique achieved RMSE, CNR and CN values of 2.77 ratios of 2 and 3, respectively. It outperformed other evaluated methods in terms of used metrics. In the experiments on images with white matter lesions, the method faithfully reconstructed the lesions. We introduced a novel method for MR reconstruction, which takes a new perspective on regularization by learning priors. Results suggest the method compares favorably against TV and dictionary based methods as well as the neural-network based ADMM-Net in terms of the RMSE, CNR and CN and perceptual image quality and can reconstruct lesions as well.


page 12

page 14

page 15

page 17

page 23

page 24


Dependent Nonparametric Bayesian Group Dictionary Learning for online reconstruction of Dynamic MR images

In this paper, we introduce a dictionary learning based approach applied...

A Multimodal Deep Network for the Reconstruction of T2W MR Images

Multiple sclerosis is one of the most common chronic neurological diseas...

Data-Consistent Non-Cartesian Deep Subspace Learning for Efficient Dynamic MR Image Reconstruction

Non-Cartesian sampling with subspace-constrained image reconstruction is...

Tomographic Image Reconstruction using Training images

We describe and examine an algorithm for tomographic image reconstructio...

Σ-net: Systematic Evaluation of Iterative Deep Neural Networks for Fast Parallel MR Image Reconstruction

Purpose: To systematically investigate the influence of various data con...

Region-specific Dictionary Learning-based Low-dose Thoracic CT Reconstruction

This paper presents a dictionary learning-based method with region-speci...

Benchmarking learned non-Cartesian k-space trajectories and reconstruction networks

We benchmark the current existing methods to jointly learn non-Cartesian...

1 Introduction

Acquisition time in magnetic resonance (MR) imaging is directly related to the number of samples acquired in k-space. For high quality images, a large number of samples, and therefore long acquisition times are necessary. Reducing acquisition time in a reliable manner is an important question in MR imaging and many methods exploiting different properties of the k-space [1] and hardware design [2] have found wide use in clinical practice.

On the other hand, a substantial amount of effort went into the investigation of reconstruction methods from randomly or regularly undersampled k-space acquisitions. The random undersampling approach was primarily motivated by the compressed sensing framework where the incoherence between k-space sampling and some sparsifying transform was exploited to achieve theoretically exact reconstruction [3, 4, 5, 6, 7]. Similarly, regular undersampling schemes and corresponding reconstruction algorithms were extensively investigated [8, 9, 10, 11]. The common aspect of these two approaches is that they both are interested in inverting an underdetermined system of equations by finding good regularizers to overcome the ill-posedness. A key observation is that the use of regularizers in both approaches can essentially be viewed as introduction of prior information to the reconstruction problem.

Recently, researchers started to employ deep neural networks (DNN) [12]

for MR reconstruction inspired by its success in computer vision and medical image analysis 

[13]. In the related literature there are two main approaches for applying DNNs to the reconstruction problem. The first is to teach the networks a mapping going from the undersampled images to their fully sampled versions [14]. After training, an undersampled image is fed into the network to obtain its reconstruction at the output. This approach does not have any explicit data consistency term and cannot guarantee this property at test time. To overcome this limitation, in [15]

the authors introduce a deep cascaded network. Here a block of convolutional neural network (CNN) is followed by a data consistency term and this structure is repeated multiple times. This guarantees consistency with measured data also at test time while the CNNs perform de-aliasing.

The second approach is to use the efficient trainability of the networks to improve existing algorithms. In [16] the authors show that the iterations of the alternating direction method of multipliers (ADMM) algorithm solving a dictionary based -1 regularized reconstruction problem can be expressed as a multi-layer network. Kernels and non-linear functions are parameterized and learned by the network which improve on the fixed version in the original method. Both deep learning approaches suffer from specificity of training. Learned mappings, kernels and non-linearities are specific to a certain undersampling pattern and possibly field-of-view (FOV). Hence, the networks need to be retrained for each different configuration of FOV, undersampling factor and pattern, which limits wide use of these methods.

In computer vision, DNNs have also been used for approximating high dimensional probability distributions from samples through unsupervised learning. Such approximations allow to estimate the likelihood of a previously unseen data point. One such approach is the variational auto encoder (VAE) algorithm 

[17, 18]. Using VAEs, it is possible to approximate the distribution of patches of MR images and compute likelihood of a previously unseen image. Furthermore, the likelihood function is differentiable since it is modelled as a neural network.

In this work we develop a probabilistic reconstruction model that uses priors learned via VAEs, and deploys them as regularizers, which we term deep density prior (DDP) based reconstruction. The approach can be used to learn the prior distribution using fully sampled examples once and reconstruct for different sampling parameters without retraining. To achieve this we first formulate a Bayesian model of the imaging process and express the reconstruction problem as a Maximum-A-Posteriori (MAP) estimation problem. We show how some of the previous methods fit into this formulation and then present the proposed model using a VAE-learned prior. In the experiments, we first demonstrate that VAEs can approximate distribution of patches of MR images by showing examples of patches sampled from the prior. We then show reconstruction results from the proposed model and compare with both more conventional approaches as well as recent DNN based methods.

2 Methods

In the first two parts of this section, we provide a brief background on Bayesian formulation of the MR reconstruction problem and the VAE algorithm. Then, starting from Section 2.3, we present our main technical contribution, that is learning a prior for MR image patches and integrating it in the reconstruction problem.

2.1 Bayesian formulation of the MR reconstruction problem

The MR image is denoted as , where N is number of voxels222In this work we focus on 2D imaging, however, the same techniques can be applied to 3D imaging and this extension will be a part of our future work.. The imaging operation is given by the undersampling encoding operation , where is the sensitivity encoding operator. Here, is number of coils, is the Fourier operator and is the undersampling operator, with .

Assuming complex valued zero mean normal distributed noise, denoted as

, the acquired data can be written as . Under this noise model the data likelihood becomes


where H denotes the Hermitian transpose and

is the standard deviation of the noise. In reconstruction, we are interested in the posterior distribution

, i.e. the probability of the image being m

given the k-space measurements, which can be written using the Bayes’ theorem as


The common approach to model the reconstruction problem, which we will also use, is to use the MAP estimation


where the equality is due to not depending on m. is called the prior term and represents the information one has about the fully sampled image before the data acquisition. Taking the of both sides and plugging in the definition of the likelihood term yields


Rewriting the likelihood and ignoring terms that do not vary with m yields


where we define . Taking the maximum, or equivalently taking the minimum of the negative of the expression and multiplying all sides with recovers the conventional formulation of a reconstruction problem with a data and a regularization term


From the point of view of Equation 7, regularization-based reconstruction algorithms differ in the log prior term they use. A generic prior expression can be written as , where is a sparsifying operator, is the expected image and is the -p norm.

For instance, forcing the image to be sparse in some domain can be expressed as assuming the data is i.i.d. according to the Laplace distribution in the domain [19, 20]. Mathematically, this is


where we set , denote the matrix determinant with 333Where we used the change of variables rule to relate the distributions [21]., use as the magnitude and i indexes voxels in the image. Now, taking of the expression and ignoring the terms that do not vary with m we get


Setting yields the prior term in Equation 7. Furthermore, setting as the gradient operator with first order finite differences, denoted by , corresponds to enforcing sparsity in the gradient domain recovering the basic total variation compressed sensing reconstruction

Alternatively, both and can be estimated from data. One example for this approach is to assume a Normal prior on m, and estimate the mean and the covariance from a low-resolution image of the same object [10, 22]. The posterior is then also a Normal distribution due to conjugacy relations and the MAP estimate is given with the posterior mean [23]

A similar idea has been applied in k-t SENSE and k-t PCA methods in the dynamic imaging setting to exploit spatio-temporal correlations [22, 10].

In this work, we follow the same approach of estimating the prior term from example images and propose to approximate with a neural network model. We train a VAE on patches of fully sampled MR images to capture the distribution of patches and use this prior for reconstruction.

2.2 Learning the data distribution with VAEs

VAE is an unsupervised learning algorithm used to approximate high-dimensional data distributions 

[17, 18]. We introduce VAEs very briefly and refer the reader to [17] for further discussion.

The main goal of the VAE algorithm is to approximate the data distribution using a latent model given as


where denote the latent variables, the data and the prior over the z’s. Direct computation of , or requires integrating over z, which is not feasible even with relatively small . Variational approximation uses an approximate distribution for the posterior to address this problem. Using the approximation, can be decomposed into two terms [24]


The first term is referred to as the evidence lower bound (ELBO) and the second term is the Kullback-Leibler divergence (KLD) between the approximate and true posteriors.

VAE parameterizes the data likelihood and the approximate posterior using networks, i.e. and , and during training maximizes ELBO for a given set of data samples with respect to the parameters of the networks and . Maximizing ELBO will minimize the KLD and maximize . The optimization during training is written as


where is the training sample.

The networks and are called the encoder and the decoder, respectively. The former takes the data sample x and encodes it into a posterior distribution in the latent space with network parameters . If the posterior distribution is modelled as a Gaussian, then the encoder outputs a mean and a covariance matrix for z depending on x

. The decoder network on the other hand, takes a latent vector

z and maps it to a conditional distribution of the data given z.

In this work, we use the vanilla VAE design [17] except for the data likelihood, where we use a multi-modal Gaussian , similar to [25].

In our model, we use the VAE model to approximate the prior distribution of magnitude of 2D MR patches of fully sampled images. In other words, we approximate , where x is a patch extracted from a complex valued MR image of size x. We provide the exact parameters of the network design in Section 2.5.

2.3 Deep density prior reconstruction model

Once the network is trained and optimal values and are found, we can integrate the prior within a Bayesian formulation of the reconstruction problem. We make two key observations to achieve this. First, an approximate log likelihood of an image patch x can be obtained by evaluating with the optimal parameters.


where we use the magnitude of the patch within the ELBO. This allows us to formulate the proposed model as the following MAP estimation problem


where denotes a set of (overlapping) patches covering the image m and is the magnitude of the image patch. Note that this approach assumes independence between different patches and therefore, ignores statistical dependencies between them. However, it also makes the computations much easier.

Since an exact computation of the ELBO term requires evaluating the expectation with respect to , which is computationally not feasible, we use a Monte Carlo sampling approach to calculate the ELBO as follows


Here represents the number of Monte-Carlo samples.

Our second key observation is that the approximation in Equation 15 is differentiable since each term is defined through networks that are themselves differentiable. This is the critical aspect that allows integrating the trained VAE as a prior into an optimization based reconstruction algorithm. Plugging the ELBO approximation into Equation 14, we obtain the formulation of the DDP reconstruction problem


where the first term is the usual data term and the second term within the summation is the regularization term that arises from the learned prior.

We can compute the total derivative of the prior term with respect to each image patch as follows


where we defined for notational simplicity. The second term in the last line is due to the dependency of the samples to x. The term is due to taking the derivative of the magnitude with respect to the image patch.

1:: undersampled k-space data
2:: undersampling encoding operator
3:VAE: the trained VAE
4:procedure DDPrecon(, , VAE)
5:      initialize with the zero-filled image
6:     for  do main loop: POCS iterations
8:         for  do inner loop: iterations for the prior projection
9:               creates a set of patches covering the image
10:              for  no of patches do loop over all the patches in
11:                   calculate the derivative acc. to Eq. 19
12:              end for
15:         end for
18:     end for
19:     return Resulting reconstruction
20:end procedure
Algorithm 1 Deep density prior (DDP) reconstruction using POCS. See text for a more detailed explanation.

2.4 Optimization using projection onto convex sets

We solve the DDP optimization problem given in Equation 16 using the projection onto convex sets (POCS) algorithm [26], specifically using the formulation in [27]. POCS is an iterative minimization process, where the solution variable is projected sequentially onto different convex sets, each defined by one of the constraints in the problem.

The projection for the data consistency term is given simply as  [26]. Since we do not have a projection operator for the prior term, we approximate it by several gradient ascent steps with a small step size. We use the final image at the end of the ascent steps as the projected image patch. We define the operation as follows: i) we create a set of patches from the image at iteration t, ii) obtain the derivatives for each of these patches using Equation 19, which have the same size as the patches themselves, iii) create a derivative image from these derivative patches by averaging the values where the patches overlap, iv) update the image using the derivative image, v) repeat this K times. To reduce edge effects resulting from patchwise projections, we use four sets of overlapping patches.

As the likelihood term is only trained on magnitude patches and cannot convey any information regarding the phase, we also use a phase projection following the prior projection, where we set the phase of the image to zero, given as . Notice we do this because we expect the images to have zero phase in this specific application. It is however possible to change this to any other constraint on the phase of the image, such as a low resolution estimation of the phase [3] or zero-divergence constraint for phase contrast flow imaging reconstruction [28].

With the data consistency, prior and phase projections defined as above, one step of reconstruction within the POCS framework becomes


We apply POCS steps to complete the reconstruction. Algorithm 1 provides a summary of the reconstruction procedure.

2.5 Details on VAE network architecture and training

We use batches of 50 patches of size 28x28 and a 60 dimensional latent space. Both networks are depicted in Figure 1. For the encoding network, the input is an image patch and the output is the mean and the covariance of the corresponding posterior. The decoding network takes a vector z as input and outputs the mean and the covariance of the corresponding likelihood for the patch. Both networks are mostly convolutional with single fully connected layers. All convolutional layers for both networks use 3x3 kernels and all layers also have additive bias terms throughout the network.

Figure 1: Architecture of the encoding (top) and decoding (bottom) networks of our VAE. Arrows indicated with

are convolutional layers followed by ReLU non-linear activation except

of decoding network, which is not followed by non-linear activation. Arrows indicated by FC are fully connected layers. FC of the decoding network is followed by a ReLU activation but not of the encoding network.

Due to stability issues we use the log of the variance values throughout the network. We initialize the network weights with a truncated normal initializer with standard deviation 0.05. We use Adam 


for optimization (learning rate of 5e-4, default momentum values in Tensorflow).

3 Experimental setup

3.1 MR image data

We evaluated our method using structural images from the Human Connectome Project (HCP)444Further details of the HCP data set can be found in HCP-S500+MEG2-Release-Reference-Manual and HCP-S1200-Release-Appendix-I (obtainable from data set [30]. High quality and large number of HCP images are ideal for learning priors with the VAE model.

We took 2D slices from the T1 weighted 3D MPRAGE images acquired with 2400 ms, 2.14 ms and 1000 ms for TR, TE and TI, respectively, flip angle of 8 degrees, a standard field of view for all subjects of 224x224x224 mm with 0.7 mm isotropic resolution using a bandwidth of 210 Hz/voxel. Images were acquired with a 3T Siemens device. Fat suppression was used during acquisition. In our experiments, we used the minimally preprocessed images in order to also have the corresponding FreeSurfer [31] segmentations, which we use in our evaluation. Preprocessing steps were resampling to 1mm isotropic resolution with FOV matrix 260x311x260 and rigid alignment. Despite the rigid alignment, there were substantial local orientation differences between the images from different subjects. We used five central slices with skipping four slices between each. We cropped the images to a size of 252x308, removing only background, to reduce computational load. We normalized the images by mapping their 99 percentile to 1.

We used images from 158 subjects (790 images in total) to train the prior VAE model. The VAE model trained for 200k iterations. As test data, we took central slices from images of 17 separate subjects.

To test if the proposed reconstruction method can be used on a domain that is different from the one the prior is trained on, we experimented with two slices from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set555Two of the images used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see The images were selected from subjects with Alzheimer’s disease and who have visible white matter lesions to also test whether the proposed method will be able to faithfully reconstruct lesions. The acquisition parameters of the ADNI images were different than HCP: TR, TE, TI values were 7.34 ms, 3.03 ms, 400 ms, flip angle was 11 degrees, FOV matrix was 196x256x256 with resolution 1.2 mm x 1 mm x 1 mm. The images were acquired with a 3T GE scanner with no fat suppression and were bias-field corrected with the N3 algorithm [32]. We extracted the central slices that showed the largest lesions from these images and further cropped the FOV to 168x224 to get rid of the empty regions in the images to accelerate computations. We normalized the image intensities the same way as the HCP data.

3.2 Setup and evaluation

In our assessment, we artificially undersampled the test images in k-space, reconstructed them back and compared the results with the original images. We experimented with varying undersampling ratios, which we denote with R when presenting results.

We used Cartesian undersampling in one dimension while fully sampling in the other dimension, corresponding to phase-encoding and readout directions, respectively. We present some examples of undersampling patterns in Figures 4 and 4

. We generated the patterns by randomly sampling a 1D Gaussian distribution along the phase-encoding dimension. We randomly drew many sampling patterns from the Gaussian distribution and selected the ones with the best peak-to-side ratio. In addition, we added the central 15 profiles to these selected patterns to fully sample the low-frequency components. We used 2, 3, 4 and 5 for net undersampling ratios (including the fully sampled center). We assumed a single coil imaging with uniform sensitivity throughout the experiments for proof of concept, as the proposed method works independent of the number of coils.

We tried different configurations of latent space size and patch size to see how sensitive the method is to changes in these parameters. We used patch sizes of 12x12, 20x20, 28x28, and 36x36 and latent space dimensions of 40, 60 and 80. For each configuration we retrained the prior model. We also analyzed the proposed method’s robustness to noise. To this end, we added synthetic noise to the fully sampled images before undersampling. The original images had a SNR between 50 to 70, depending on the subject. We added 4 different levels of noise with SNR values 40, 30, 20 and 10. Notice the actual noise on the images is higher (SNR nearly half) due to the base noise on the original images. We studied the different configurations and robustness to noise using 5 of the test images.

While assessing the DDP method, we generated a new undersampling pattern for each test to make sure the reconstruction results covered the variability of the undersampling patterns. We reconstructed the test images using 30 POCS iterations (T=30) for R=2 and 3 and 60 for R=4 and 5, 10 iterations for the prior projection (K=10) and the step size was set to =1e-4.

We compared the reconstruction with the fully sampled ”ground truth” images using Root-Mean-Squared-Error (RMSE), Contrast-to-Noise-Ratio (CNR) and Contrast difference (CN) computed at the gray and white matter boundary in the brain. In order to obtain white matter boundaries we applied binary erosion (with a square structuring element of size 7x7) to the white matter segmentations, as computed by FreeSurfer, and took the difference of the original and eroded segmentations.

3.3 Compared methods

As a basic benchmark, we evaluated zero-filling reconstructions. As the first baseline, we used total variation (TV) reconstruction as described in [3]. We used the BART toolbox implementation that is publicly available [33]. We used the ”pics” tool with TV regularization (regularization strength 0.075) and the ADMM parameter as 1. We used 20 conjugate gradient steps and 4500 total iterations (”bart pics -R T:3:0:0.0075 -u1 -C20 -i4500”). We ran a grid search for the best parameter setting in the RMSE sense, which we chose as above.

As the second method, we used reconstruction using dictionary learning (DL) [34]666Code available at yoram/DLMRI-Lab/Documentation.html

. We used 200 iterations, a patch size of 36 voxels and 36 dictionary atoms. Furthermore, we set number of signals used for training to 7200 and the overlap stride to 2. We used K-SVD learning with both sparsity and error threshold. The sparsity level was set to 7. The error threshold was set to 0.046 for the first four iterations, then to 0.0322 for the rest of the iterations. We used 15 K-SVD iterations. We choe the parameters as sugested by the authors in code, but increased the number of iterations.

We employed the recently proposed ADMM-Net [16] 777

. We modified the code very slightly to work with non-square images and Cartesian undersampling patterns. To train the model, we used the same 790 images, which we used to train the VAE model. To train and test the ADMM-Net, we fixed the undersampling pattern for all undersampling ratios, as required by the method. We used 15 stages with 8 filters (filter size 3x3), padding as 1. We did not use weight decay during training. We set the maximum iteration number to 25 for the L-BFGS algorithm. It trained to maximum iterations for R=2 (45 hours), 13 iterations before convergence for R=3 (42 hours), 18 iterations before running out of time (120 hours) for R=4 and maximum iterations for R=5 on a GPU (GeForce GTX TITAN X). The normalized mean squared errors were 0.078 and 0.035 for R=2, 0.11 and 0.071 for R=3, 0.15 and 0.11 for R=4, and 0.19 and 0.13 for R=5, before and after training, respectively. We took the parameter setting for which the best results were reported in the paper.

We also compared with the methods shift-invariant discrete wavelet reconstruction (SIDWT) [35], fast multiclass dictionary learning (FDLCP) [36] and patch based directional wavelets (PBDW) [37]. For these methods we cropped the images to 256x256 and generated new undersampling patterns for this size since the authors implementations worked only with that size888Implementations can be found on We did not modify the code for these methods and took the parameters them as set by the authors in the code. For comparison, we ran our proposed method on these images as well.

Finally we compared to an algorithm based on BM3D denoising, namely BM3D-MRI [38]999Code available at and used the parameters as set by the author in the code.

4 Results

Figure 2: 16 image patches sampled from the prior learned by the VAE algorithm. Images on the left are the mean predictions and the images on the right are the corresponding variance maps. The patches are 28x28.

We start by showing patches sampled from the prior model trained for patch-size of 28x28 and latent dimension of 60 in Figure 2. These patches were generated by simply feeding 16 random z vectors drawn from unit Gaussian to the decoder network. The decoder network outputs the mean images, i.e. , and the corresponding variance images, i.e. . We would like to note that we have not cherry-picked these images. The sampled mean patches, shown on the left, look realistic where gray matter, white matter and gyri/sulci structures are clearly defined and properly positioned. These generated samples suggest that the VAE algorithm is able to approximate the underlying distribution of MR patches. Variance images images on the right show that VAE places high variance in the CSF areas as well as boundaries between structures. We observe that isolated gray matter or CSF islands receive high variance.

Next, we show visual reconstruction results. Figure 4 shows results for one of the test images from the HCP data for R=2. The sampling pattern is also shown in the figure. All the methods achieve good reconstruction quality in terms of reproducing structural and texture information, except the TV images which are cartoon-like. The error images show higher error in regions where gray matter and white matter structures are intertwined.

Aliasing artifacts become more prominent at R=3 as seen in Figure 4. SIDWT cannot complete dealiasing, TV and PBDW are have structural and texture problems. ADMM-Net cannot reconstruct the small GM island in the zoomed image. DDP, DLMRI and FDLCP perform well.

We show four more randomly selected images from the test set in supplementary Figures S4 and S5. We also show results for higher undersampling ratios in supplementary Figures MR image reconstruction using deep density priors and MR image reconstruction using deep density priors.

We present the quantitative results for reconstruction accuracy in Table 1. In the first part, i.e. for the full FOV reconstructions BM3D-MRI performs the best for R=2 and 3, followed by the DDP. After R=4 the DDP method works best. For the cropped FOV part, The FDLCP reconstruction outperforms all the other methods in all cases in terms of RMSE with the proposed method following as the 2nd. In terms of CNR, the proposed method performs better than all the others.

A plot showing convergence of the POCS algorithm can be seen in Figure S1. Results for patch size and latent dimension can be seen in supplementary Figure S2.

Lastly, we show DDP reconstructions in Figure 5 for the ADNI images for R=2. We used the VAE model was trained on HCP images, which is composed of healthy subjects, to reconstruct the images here. The reconstructed images recover gray matter-white matter structures and edges faithfully. The white matter lesions are also well reconstructed. The error maps do not indicate a specific increase of error in the lesion regions.

Figure 3: Reconstruction results for R=2. First row shows the fully sampled image (FS), the undersampled image (US) and the results of different reconstruction methods. Second row shows the used undersampling pattern and the error maps (intensities clipped to (-0.3, 0.3)). Third row shows a zoomed in part of the images. US, DDP, TV, DLMRI, ADMM-Net and BM3D-MRI images are produced with the undersampling pattern that was used to train the ADMM-Net for comparability. In the second part DDP, SIDWT, FDLCP and PBDW are reconstructed for a smaller FOV.

Figure 4: Similar display as in Figure 4 with R=3.

5 Discussion

MR patches sampled from the learned distribution shown in Figure 2 support our hypothesis that the VAE model can learn to approximate the distribution of MR patches. The variance maps show that the sulci-like generated structures filled with CSF have higher variance, which is in accordance with the literature [39].

Figure 5: DDP reconstruction results for two images with white matter lesions due to Alzheimer’s disease from the ADNI data set for R=2. Images show the original (FS), undersampled (US), reconstructed (DDP) and the error maps from left to right. Lesions are clearly visible in the reconstructed images as well. Error map values are clipped to (-0.3,0.3)

The reconstruction examples in Figure 4 and Figure 4 show that our proposed VAE based deep density prior reconstruction method produces promising results. This suggests that the learned distribution of MR patches can be used as prior for reconstruction. Reconstructions with R=3 shows limitations of the methods.

In the full FOV case, the proposed method performs better than most other methods in terms of quality metrics except for the BM3D-MRI and better or equally in visual quality assessment. The BM3D based method performs worse than the DDP in higher R’s since denoising becomes more difficult.

For the cropped FOV FDLCP performs consistently better, which is a highly optimized method. The results we report here are for proof of concept and are open to improvement. Compared to most other methods learning the data distribution and using this as a prior works better than enforcing a specific distribution in a specific domain, such as the Laplace distribution in the gradient domain as in TV or more generalized distributions in a kernel domain as in DL. These methods are inherently limited by their formulations compared to the more general formulation of the DDP approach presented here. Hence the DDP approach can be expected to improve over these methods further.

Two of the limitations of our method are the variational assumption and the unit Gaussian prior for the latent space. A lot of work went into investigating different latent space prior structures such as Gaussian mixtures or graphical models which could in principal improve the encoding and representative capacity of the latent variables to address the latter limitation [40, 41]. Similarly a lot of research is going into different density estimation methods, for instance using generative adversarial networks [42]. These have the potential to improve on the approximation of the true likelihood and hence to improve reconstruction quality as well.

Another limitation of the proposed method is the independence assumption over patches. This assumption can be removed by taking into account the statistical dependence between different patches in an image. However, this is likely to increase the computational complexity of the reconstruction method.

Experiments with different configurations show that the DDP reconstruction is not sensitive to patch size and latent space dimensions for a reasonable range. Furthermore, as expected the performance degrades with decreasing SNR.

We emphasize the generality of our method in contrast to most DNN based methods in the literature. Most methods require a specific mapping to be learned for every undersampling pattern and/or lack an explicit data consistency term, whereas our method does not suffer from these limitations. Our method learns the prior directly on images and can be used to reconstruct any sampling scheme faithful to the measured data without need of retraining as long as the images are from the same domain.

The ADNI reconstruction results are also encouraging in two terms. First, they show that the learned model does not blur out the lesion regions during the prior projection, which could be expected since the training images do not have any examples of lesions. However, for a proper treatment of images with lesions, we believe the training data set should include images with lesions.

Secondly, the proposed method performs reasonably despite the domain difference between the training and test sets. Although the two data sets are acquired at the same field strength, they still differ in the acquisition protocol and imaging parameters. Furthermore, our method is invariant to FOV changes but not to scale changes. HCP and ADNI voxel sizes are different (1mm x 1mm vs. 1.2 mm x 1 mm), however the reconstructions still look plausible. Notice that the HCP has fat suppression which reduces the signal from the skull significantly, however ADNI images do not have the suppression, making the dealiasing more challenging. The results indicate that the learned model can generalize to different scales and imaging protocols without retraining. More challenging problems which might be attacked by domain adaptation methods such as training and testing on different imaging modalities still require further research.

R=2 R=3 R=4 R=5
FS - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02)
Zero-fill 13.03(1.13) 0.40(0.09) 0.12(0.02) 21.15(1.36) 0.33(0.07) 0.09(0.02) 24.92(1.91) 0.31(0.06) 0.08(0.02) 27.36(1.79) 0.30(0.06) 0.08(0.02)
DDP 2.64(0.39) 0.48(0.11) 0.12(0.02) 4.14(0.50) 0.48(0.11) 0.12(0.02) 6.36(1.69) 0.47(0.11) 0.11(0.02) 10.00(2.39) 0.42(0.11) 0.10(0.02)
TV 3.87(0.47) 0.46(0.11) 0.12(0.02) 7.56(0.83) 0.40(0.10) 0.09(0.02) 11.40(1.39) 0.35(0.09) 0.08(0.02) 14.56(1.23) 0.31(0.08) 0.07(0.02)
DLMRI 4.48(0.52) 0.46(0.11) 0.12(0.02) 7.25(0.83) 0.40(0.10) 0.10(0.02) 10.72(1.31) 0.33(0.09) 0.08(0.02) 13.87(1.25) 0.30(0.08) 0.07(0.02)
ADMMNet 3.65(0.39) 0.48(0.11) 0.12(0.02) 7.41(0.46) 0.44(0.11) 0.11(0.02) 11.80(0.69) 0.36(0.09) 0.09(0.02) 13.54(0.70) 0.32(0.08) 0.08(0.02)
BM3D-MRI 1.92(0.36) 0.48(0.10) 0.12(0.02) 4.23(1.05) 0.46(0.10) 0.11(0.02) 8.08(2.48) 0.43(0.10) 0.10(0.02) 11.70(2.76) 0.38(0.09) 0.09(0.02)
FS - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02) - 0.48(0.10) 0.12(0.02)
DDP 2.73(0.38) 0.48(0.10) 0.12(0.02) 4.67(0.92) 0.48(0.10) 0.12(0.02) 7.28(1.27) 0.46(0.10) 0.11(0.02) 12.72(2.66) 0.39(0.08) 0.10(0.02)
SIDWT 4.49(0.98) 0.45(0.11) 0.12(0.02) 9.42(1.62) 0.39(0.09) 0.12(0.02) 14.57(1.96) 0.33(0.08) 0.10(0.02) 18.76(2.80) 0.32(0.07) 0.10(0.02)
FDLCP 2.63(0.35) 0.48(0.10) 0.12(0.02) 4.35(0.87) 0.45(0.10) 0.14(0.03) 6.72(0.89) 0.41(0.10) 0.13(0.03) 9.62(1.48) 0.35(0.08) 0.11(0.02)
PBDW 3.24(0.38) 0.47(0.11) 0.12(0.02) 5.59(0.94) 0.44(0.10) 0.13(0.02) 8.51(0.98) 0.38(0.09) 0.12(0.02) 11.38(1.39) 0.34(0.08) 0.10(0.02)

R - net undersampling ratio, RMSE - root mean squared error in percentage, CNR - contrast to noise ratio, CN - contrast.

Table 1: Table summarizing results for different reconstruction quality metrics. Numbers indicate the mean (and standard deviation) of the error metric for N=17 test images. Second part of the table reports results for the cropped FOV.

6 Conclusion

In this paper we proposed a novel method, termed deep density prior for MR reconstruction from undersampled k-space acquisitions. The method uses the VAE algorithm to learn the distribution of MR patches from a training set composed of fully sampled images. The model then uses this learned distribution as a probabilistic prior in a Bayesian reconstruction framework. We have shown that the VAE can approximate the distribution and generate realistic looking MR patches. Furthermore, reconstruction with the DDP approach yielded promising results for HCP and ADNI data sets in terms of visual quality and quantitative measures.


Authors would like to acknowledge Prof. Klaas Pruessmann and Prof. Sebastian Kozerke for valuable discussions. We thank NVIDIA for their GPU donation. The presented work is funded by Swiss National Science Foundation grant number: 205321_173016. For the ADNI data usage statement see the footnote101010Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California..


  • [1] D. A. Feinberg, J. D. Hale, J. C. Watts, L. Kaufman, and A. Mark, “Halving mr imaging time by conjugation: demonstration at 3.5 kg.,” Radiology, vol. 161, no. 2, pp. 527–531, 1986. PMID: 3763926.
  • [2] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “Sense: Sensitivity encoding for fast mri,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999.
  • [3] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
  • [4] M. Guerquin-kern, M. Häberlin, K. P. Pruessmann, and M. Unser, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, pp. 1649–1660, 2011.
  • [5] S. G. Lingala, Y. Hu, E. Dibella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: K-t SLR,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1042–1054, 2011.
  • [6] D. L. Donoho, “Compressed sensing,” IEEE Transactions in Information Theory, vol. 52, pp. 1289–1306, Apr. 2006.
  • [7] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
  • [8] L. Wissmann, A. Gotschy, C. Santelli, K. C. Tezcan, S. Hamada, R. Manka, and S. Kozerke, “Analysis of spatiotemporal fidelity in quantitative 3d first-pass perfusion cardiovascular magnetic resonance,” Journal of Cardiovascular Magnetic Resonance, vol. 19, p. 11, Jan 2017.
  • [9] K. P. Pruessmann, M. Weiger, P. Börnert, and P. Boesiger, “Advances in sensitivity encoding with arbitrary k-space trajectories,” Magnetic Resonance in Medicine, vol. 46, no. 4, pp. 638–651, 2001.
  • [10]

    H. Pedersen, S. Kozerke, S. Ringgaard, K. Nehrke, and Y. K. Won, “K-t PCA: Temporally constrained k-t BLAST reconstruction using principal component analysis,”

    Magnetic Resonance in Medicine, vol. 62, no. 3, pp. 706–716, 2009.
  • [11] J. F. M. Schmidt, L. Wissmann, R. Manka, and S. Kozerke, “Iterative k-t principal component analysis with nonrigid motion correction for dynamic three-dimensional cardiac perfusion imaging,” Magnetic Resonance in Medicine, vol. 72, no. 1, pp. 68–79, 2014.
  • [12] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning.,” Nature, vol. 521, pp. 436–444, May 2015.
  • [13] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, no. Supplement C, pp. 60 – 88, 2017.
  • [14] B. Zhu, J. Z. Liu, B. R. Rosen, and M. S. Rosen, “Neural network mr image reconstruction with automap: Automated transform by manifold approximation,” in Proceeding of the 25th Annual Meeting of ISMRM, ISMRM, 2017.
  • [15] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction, pp. 647–658. Cham: Springer International Publishing, 2017.
  • [16] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admm-net for compressive sensing mri,” in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 10–18, Curran Associates, Inc., 2016.
  • [17] D. P. Kingma and M. Welling, “Auto-encoding variational bayes.,” CoRR, vol. abs/1312.6114, 2013.
  • [18]

    D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in

    Proceedings of the 31st International Conference on Machine Learning (E. P. Xing and T. Jebara, eds.), vol. 32 of Proceedings of Machine Learning Research, (Bejing, China), pp. 1278–1286, PMLR, 22–24 Jun 2014.
  • [19] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using laplace priors,” Transactions in Imgage Processing, vol. 19, pp. 53–63, Jan. 2010.
  • [20] D. P. Wipf and B. D. Rao, “Sparse bayesian learning for basis selection,” IEEE Transactions on Signal Processing, vol. 52, pp. 2153–2164, Aug 2004.
  • [21] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
  • [22] J. Tsao, P. Boesiger, and K. P. Pruessmann, “k-t blast and k-t sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations,” Magnetic Resonance in Medicine, vol. 50, no. 5, pp. 1031–1042, 2003.
  • [23] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Science). Chapman and Hall/CRC, 2 ed., July 2003.
  • [24] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
  • [25] S. Yeung, A. Kannan, Y. Dauphin, and L. Fei-Fei, “Tackling over-pruning in variational autoencoders,” CoRR, vol. abs/1706.03643, 2017.
  • [26] A. A. Samsonov, E. G. Kholmovski, D. L. Parker, and C. R. Johnson, “Pocsense: Pocs-based reconstruction for sensitivity encoded magnetic resonance imaging,” Magnetic Resonance in Medicine, vol. 52, no. 6, pp. 1397–1406, 2004.
  • [27] A. R. De Pierro and E. S. Helou Neto, “From convex feasibility to convex constrained optimization using block action projection methods and underrelaxation,” International Transactions in Operational Research, vol. 16, no. 4, pp. 495–504, 2009.
  • [28] C. Santelli, M. Loecher, J. Busch, O. Wieben, T. Schaeffter, and S. Kozerke, “Accelerating 4d flow mri by exploiting vector field divergence regularization,” Magnetic Resonance in Medicine, vol. 75, no. 1, pp. 115–125, 2016.
  • [29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
  • [30] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, D. C. V. Essen, and M. Jenkinson, “The minimal preprocessing pipelines for the human connectome project,” NeuroImage, vol. 80, no. Supplement C, pp. 105 – 124, 2013. Mapping the Connectome.
  • [31] M. Reuter, N. J. Schmansky, H. D. Rosas, and B. Fischl, “Within-subject template estimation for unbiased longitudinal image analysis,” NeuroImage, vol. 61, no. 4, pp. 1402 – 1418, 2012.
  • [32] J. Sled, A. Zijdenbos, and A. Evans, “A nonparametric method for automatic correction of intensity nonuniformity in mri data,” IEEE Transactions in Medical Imaging, vol. 17, pp. 87–97, 01 2002.
  • [33] M. Uecker, J. I. Tamir, F. Ong, S. Iyer, J. Y. Cheng, and M. Lustig, “Bart: version 0.4.00,” Dec. 2016.
  • [34] S. Ravishankar and Y. Bresler, “Mr image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE transactions on medical imaging, vol. 30, pp. 1028–41, 11 2010.
  • [35] B. Ning, X. Qu, D. Guo, C. Hu, and Z. Chen, “Magnetic resonance image reconstruction using trained geometric directions in 2d redundant wavelets domain and non-convex optimization,” Magnetic Resonance Imaging, vol. 31, no. 9, pp. 1611 – 1622, 2013.
  • [36] Z. Zhan, J. F. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu, “Fast multiclass dictionaries learning with geometrical directions in mri reconstruction,” IEEE Transactions on Biomedical Engineering, vol. 63, pp. 1850–1861, Sept 2016.
  • [37] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen, “Undersampled mri reconstruction with patch-based directional wavelets,” Magnetic Resonance Imaging, vol. 30, no. 7, pp. 964 – 977, 2012.
  • [38] E. M. Eksioglu, “Decoupled algorithm for mri reconstruction using nonlocal block matching model: Bm3d-mri,” Journal of Mathematical Imaging and Vision, vol. 56, pp. 430–440, Nov 2016.
  • [39]

    R. Tanno, D. E. Worrall, A. Gosh, E. Kaden, S. N. Sotiropoulos, A. Criminisi, and D. Alexander, “Bayesian image quality transfer with cnns: Exploring uncertainty in dmri super-resolution,” in

    Intl Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), Springer, September 2017.
  • [40] N. Dilokthanakul, P. A. M. Mediano, M. Garnelo, M. C. H. Lee, H. Salimbeni, K. Arulkumaran, and M. Shanahan, “Deep unsupervised clustering with gaussian mixture variational autoencoders,” CoRR, vol. abs/1611.02648, 2016.
  • [41] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 2946–2954, Curran Associates, Inc., 2016.
  • [42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 214–223, 2017.