Robust Compressive Sensing MRI Reconstruction using Generative Adversarial Networks

10/14/2019 ∙ by Puneesh Deora, et al. ∙ IIT Roorkee 0

Compressive sensing magnetic resonance imaging (CS-MRI) accelerates the acquisition of MR images by breaking the Nyquist sampling limit. In this work, a novel generative adversarial network (GAN) based framework for CS-MRI reconstruction is proposed. Leveraging a combination of patchGAN discriminator and structural similarity index based loss, our model focuses on preserving high frequency content as well as fine textural details in the reconstructed image. Dense and residual connections have been incorporated in a U-net based generator architecture to allow easier transfer of information as well as variable network length. We show that our algorithm outperforms state-of-the-art methods in terms of quality of reconstruction and robustness to noise. Also, the reconstruction time, which is of the order of milliseconds, makes it highly suitable for real-time clinical use.



There are no comments yet.


page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Magnetic resonance imaging (MRI) is a commonly used non-invasive medical imaging modality that provides soft tissue contrast of excellent quality as well as high resolution structural information. The most significant drawback of MRI is its long acquisition time as the raw data is acquired sequentially in the k-space which contains the spatial-frequency information. This slow imaging speed can cause patient discomfort, as well as introduce artefacts due to patient movement.

Compressive sensing (CS) [4] can be used to accelerate the MRI acquisition process by undersampling the k-space data. Reconstruction of CS-MRI is an ill-posed inverse problem [7]. Conventional CS-MRI frameworks assume prior information on the structure of MRI by making use of predefined sparsifying transforms such as the discrete wavelet transform, discrete cosine transform, etc. to render the reconstruction problem well-posed [12]. Instead of using predefined transforms, the sparse representation can be learnt from the data itself, i.e. dictionary learning (DLMRI) [17]. These frameworks however, suffer from the long computation time taken by iterative optimization processes [16] as well as the assumption of sparse signals [12], which might not be able to fully capture the fine details [11].

Bora et al. [2]

have shown that instead of using the sparsity model, the CS signal can be recovered using pretrained generative models, where they use an iterative optimization to obtain the reconstructed signal. Another deep learning based approach was introduced by Yang

et al. [22], where alternating direction method of multipliers [3] is used to train the network (DeepADMM) for CS-MRI reconstruction.

Recent works [21, 13] demonstrate the application of generative adversarial networks (GANs) [6] to reconstruct CS-MRI. In these works, the use of a large set of CS-MR images and their fully sampled counterparts for training the GAN model can facilitate the extraction of prior information required to solve the reconstruction problem [20]. The trained model is then used to obtain the reconstructed output for a new CS-MR image in a very short time.

However, these state-of-the-art methods often lack structural information in the reconstructed ouput. Super resolution (SR) is another well-known inverse problem that tries to interpolate both low frequency and high frequency components from a low resolution image. We have borrowed some of the ideas from the SR problem and incorporated them in CS-MRI reconstruction problem to achieve robust generation with subtle structural information. The main contributions of this work are:

  • We propose a novel generator architecture by incorporating residual in residual dense blocks (RRDBs) [19] in a U-net [18] based architecture.

  • We propose the use of patchGAN discriminator [8] for better reconstruction of high frequency content in the MR images.

  • Inspired by previous works on SR [24], a perception based loss in the form of structural similarity (SSIM) index [23] is incorporated to achieve better reconstruction by preserving structural information.

  • To stabilize the training of GAN, we make use of the Wasserstein loss [1].

  • In order to make the reconstruction robust to noise, we propose the use of noisy images for data augmentation to train our GAN model.

2 Methodology

The acquisition model for the CS-MRI reconstruction problem in discrete domain can be described as:



is a vector formed by the pixel values in the

desired image, denotes the observation vector, and is the noise vector. denotes the set of complex numbers. is an operator which describes the process of random undersampling in the k-space. Given an observation vector , the reconstruction problem is to find out the corresponding , considering to be a non-zero vector. We choose to find the solution to this reconstruction problem using GAN model.

A GAN model comprises of a generator and a discriminator , where the generator tries to fool the discriminator by transforming input vector to the distribution of true data . On the other hand, the discriminator attempts to distinguish samples of from generated samples .

We incorporate the conditional GAN (cGAN) based framework [14] in our study. The model is conditioned on the aliased zero-filled reconstruction (ZFR) (size ), given by , where denotes the Hermitian operator. Instead of using a binary cross-entropy based adversarial loss for training the cGAN model, we use the Wasserstein loss. This helps in stabilizing the training process of standard GANs, which suffer from saturation resulting in vanishing gradients. Mathematically, the cGAN model with the Wasserstein loss solves the following optimization problem:


where denotes the expectation over a batch of images.

Figure 1: Generator architecture.

Fig. 1 shows the generator architecture of the proposed model, which is inspired from [21]. The architecture is based on a U-net [18], which consists of several encoders and corresponding decoders. Each encoder is in the form of a convolutional layer, which decreases the size and increases the number of feature maps. Each decoder consists of a transposed convolutional layer, to increase the size of the feature maps. In order to transfer the features of a particular size from the encoder to the corresponding decoder, skip connections are present. Instead of obtaining feature maps of size lower than using more encoders (and decoders), the proposed architecture consists of RRDBs at the bottom of the U-net. Each RRDB consists of dense blocks, as well as residual connections at two levels: across each dense block, and across all the dense blocks in one RRDB, as shown in Fig. 1. The output of each dense block is scaled by

before it is added to the identity mapping. Residual connections make the length of the network variable thereby making identity mappings easier to learn and avoid vanishing gradients in the shallower layers. Dense connections allow the transfer of feature maps to deeper layers, thus increasing the variety of accessible information. Throughout this network, batch normalization (BN) and leaky rectified linear unit (ReLU) activation are applied after each convolutional layer. At the output, a hyperbolic tangent activation is used.

The discriminator is a convolutional neural network with 11 layers. Each layer consists of a convolutional layer, followed by BN and leaky ReLU activation. In order to improve the reconstruction of the high frequency details, a patchGAN framework is incorporated, which tries to score each patch of the image separately, and then gives the average score as the final output.

Figure 2: Results of the proposed method. (a) GT. For 20% undersampling: (b) ZFR, reconstruction results for (c) noise-free image, image with (d) 10% noise, and (e) 20% noise. For 30% undersampling: (f) ZFR, reconstruction results for (g) noise-free image, image with (h) 10% noise, and (i) 20% noise. The top right inset indicates the zoomed in region of interest (ROI) corresponding to the red box. The bottom right inset indicates the absolute difference between the ROI and the corresponding GT. The images are normalized between 0 and 1.

In order to reduce the pixel-wise difference between the generated image and the corresponding ground truth (GT) image, a mean absolute error (MAE) based loss is incorporated while training the generator. It is given by:


where denotes the norm. Since the human vision system is sensitive to structural distortions in images, it is important to preserve the structural information in MR images, which is crucial for clinical analysis. In order to improve the reconstruction of fine textural details in the images, a mean SSIM (MSSIM) [23] based loss is also incorporated, as follows:


where is the number of patches in the image, and SSIM is calculated as follows:


where and represent two patches, and and

denote the mean and variance, respectively.

and are small constants to avoid division by zero.

The overall loss for training the generator is given by:


where , , and are the weighting factors for various loss terms.

3 Results and Discussion

3.1 Training settings

In this work, a 1-D Gaussian mask is used for undersampling the k-space. Since the ZFR is complex valued, the real and imaginary components are concatenated and passed to the generator in the form of a two channel real valued input. The batch size is set as 32. The discriminator is updated three times before every generator update. The threshold for weight clipping is 0.05. The growth rate for the dense blocks is set as 32, is 0.2, and 12 RRDBs are used. The number of filters in the last layer of each RRDB is 512. Adam optimizer [9] is used for training with and . The learning rate is set as for the generator and for the discriminator. The weighting factors are , , and

. The model is implemented using Keras framework with TensorFlow backend. For training, 4 NVIDIA GeForce GTX 1080 Ti GPUs are used, each having 11 GB RAM.

3.2 Data details

For the purpose of training and testing, T-1 weighted MR images of brain from the MICCAI 2013 grand challenge dataset [10] are used. In order to make the reconstructed output robust to noise, data augmentation is carried out using images with 10% and 20% additive Gaussian noise. To make the set of training images, 19 797 images are randomly taken from the training set of the aforementioned dataset. Out of these, noise is added to 6335 images, while the remaining 13 462 images are used without any noise. In addition, 990 images are chosen from the 13 462 noise-free images, and noise is added to them also, to get a total of 20 787 images for training. Among the noisy images, number of images with 10% and 20% noise is equal. Thus, the set contains 64.76% noise-free images, 30.48% noisy images whose corresponding noise-free images are not present in the training set, and 4.76% noisy images whose corresponding noise-free images are present in the training set. For testing purposes, 2000 images are chosen randomly from the test set of the dataset. The tests are conducted in three stages: using noise-free images, using images with 10% noise added to them, and using images with 20% noise.

3.3 Results

Table 1 summarizes the quantitative results to study the effect of addition of various components to the model. These results are reported for images in which 20% of the raw k-space samples are retained. For all the four cases, the generator is trained with

. In the first case, the GAN model comprises of a U-net generator and a patchGAN discriminator, with BN present throughout the network. In the subsequent cases, the use of RRDBs, and addition of BN to RRDBs results in significant improvement in peak signal to noise ratio (PSNR). The use of data augmentation with noisy images, in the fourth case, results in significantly better quantitative results for the reconstruction of noisy images, as compared to the previous three cases.

Network Settings
Data Augmentation
Images PSNR (dB) / MSSIM
Noise-free 40.45 / 0.9865 41.39 / 0.9810 41.88 / 0.9829 42.31 / 0.9841
10% noise added 38.25 / 0.9641 38.03 / 0.9624 38.03 / 0.9620 39.80 / 0.9751
20% noise added 33.98 / 0.9217 34.01 / 0.9210 33.78 / 0.9180 37.56 / 0.9619
Table 1: Ablation study of the model

The qualitative results of the proposed method are shown in Fig. 2(b-e) for 20% undersampling and Fig. 2(f-i) for 30% undersampling. It can be seen that the proposed method is able to reconstruct the structural content in the image, including many fine details, successfully. This is also indicated by the quantitative results shown in Table 1. Also, the contrast of the reconstructed image looks very similar to that of the GT. The reconstruction results for noisy inputs, as well as their differences with the corresponding GT, indicate the robustness of the model.

Method Noise-free images 10% noise added 20% noise added
ZFR* 35.61 / 0.7735 24.42 18.56
DLMRI[17]* 38.24 / 0.8020 24.52 18.85
Noiselet[15]* 39.01 / 0.8588 25.29 19.42
BM3D[5]* 39.93 / 0.9125 24.71 18.65
DeepADMM[22]* 37.36 / 0.8534 25.19 19.33
DAGAN[21]* 40.20 / 0.9681 37.40 34.23
Proposed 46.88 / 0.9943 42.34 39.49

*PSNR and MSSIM values are taken from [21].

Table 2: Comparison with previous methods

Table 2 shows the comparison of the proposed method with ZFR and some state-of-the-art methods like DLMRI [17], Noiselet [15], BM3D [5], DeepADMM [22], and DAGAN [21]. These results are reported for images in which 30% of the k-space data is retained. It can be seen that both the PSNR and MSSIM for the proposed method are significantly better than the previous methods. The comparison between the PSNR of reconstruction results for noisy images shows that the proposed method is highly robust to noise. Moreover, the reconstruction time of the proposed method is 9.06 ms per image on a GPU, which can facilitate real-time reconstruction of MR images.

Figure 3: Results of zero-shot inference. (a,d) GT, (b,e) ZFR, (c,f) reconstruction results for noise-free image. The top right inset indicates the zoomed in ROI corresponding to the red box. The bottom right inset indicates the absolute difference between the ROI and the corresponding GT. The images are normalized between 0 and 1.

We also tested the model trained on MR images of the brain to reconstruct MR images of canine legs from the MICCAI 2013 challenge. Fig. 3 shows the results of this zero-shot inference for images in which 20% of the k-space data is retained. Though no images of canine legs were used for training, the model is able to faithfully reconstruct most of the structural content, and is able to achieve average PSNR and MSSIM values of 41.28 dB and 0.9788, respectively, for 2000 test images.

4 Conclusion

In this paper, a novel GAN based framework has been utilized for CS-MRI reconstruction. The use of RRDBs in a U-net based generator architecture increases the amount of information available. In order to preserve the high frequency content as well as the structural details in the reconstructed output, a patchGAN discriminator and SSIM based loss have been incorporated. The use of noisy images during training makes the reconstruction results highly robust to noise. The proposed method is able to outperform the state-of-the-art methods, while maintaining the feasibility of real-time reconstruction. In future, we plan to analyze the performance of the proposed model for different k-space sampling patterns. In order to improve the reconstruction time, we plan to work on lightweight architectures. Further work may be carried out on devising regularization terms that help to preserve the finest of details in the reconstructed output.


  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017-06–11 Aug) Wasserstein generative adversarial networks. In

    Proceedings of the 34th International Conference on Machine Learning

    Vol. 70, International Convention Centre, Sydney, Australia, pp. 214–223. External Links: Link Cited by: 4th item.
  • [2] A. Bora, A. Jalal, E. Price, and A. G. Dimakis (2017) Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 537–546. External Links: Link Cited by: §1.
  • [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3 (1), pp. 1–122. External Links: Link, Document, ISSN 1935-8237 Cited by: §1.
  • [4] D. L. Donoho (2006-04) Compressed sensing. IEEE Transactions on Information Theory 52 (4), pp. 1289–1306. External Links: Document, ISSN Cited by: §1.
  • [5] E. M. Eksioglu (2016-11-01) Decoupled algorithm for MRI reconstruction using nonlocal block matching model: BM3D-MRI. Journal of Mathematical Imaging and Vision 56 (3), pp. 430–440. External Links: ISSN 1573-7683, Document, Link Cited by: §3.3, Table 2.
  • [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. External Links: Link Cited by: §1.
  • [7] E. Herrholz and G. Teschke (2010-11) Compressive sensing principles and iterative sparse recovery for inverse and ill-posed problems. Inverse Problems 26 (12), pp. 125012. External Links: Document, Link Cited by: §1.
  • [8] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017-07) Image-to-image translation with conditional adversarial networks. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Vol. , pp. 5967–5976. External Links: Document, ISSN Cited by: 2nd item.
  • [9] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. See DBLP:conf/iclr/2015, External Links: Link Cited by: §3.1.
  • [10] B. Landman and S. W. (Eds.) 2013 Diencephalon standard challenge. Cited by: §3.2.
  • [11] Y. Liu, J. F. Cai, Z. Zhan, D. Guo, J. Ye, Z. Chen, and X. Qu (2015) Balanced sparse model for tight frames in compressed sensing magnetic resonance imaging. PLOS ONE 10 (4), pp. 1–19. External Links: Link, Document Cited by: §1.
  • [12] M. Lustig, D. Donoho, and J. M. Pauly (2007) Sparse MRI: the application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. External Links: Document, Link, Cited by: §1.
  • [13] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly (2019-01) Deep generative adversarial neural networks for compressive sensing MRI. IEEE Transactions on Medical Imaging 38 (1), pp. 167–179. External Links: Document, ISSN Cited by: §1.
  • [14] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. ArXiv abs/1411.1784. Cited by: §2.
  • [15] K. Pawar, G. Egan, and J. Zhang (2015) Multichannel compressive sensing MRI using noiselet encoding. PLOS ONE 10 (5), pp. 1–27. External Links: Link, Document Cited by: §3.3, Table 2.
  • [16] X. Qu, W. Zhang, D. Guo, C. Cai, S. Cai, and Z. Chen (2010) Iterative thresholding compressed sensing mri based on contourlet transform. Inverse Problems in Science and Engineering 18 (6), pp. 737–758. External Links: Document, Link, Cited by: §1.
  • [17] S. Ravishankar and Y. Bresler (2011-05) MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Transactions on Medical Imaging 30 (5), pp. 1028–1041. External Links: Document, ISSN Cited by: §1, §3.3, Table 2.
  • [18] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net Convolutional networks for biomedical image segmentation. ArXiv abs/1505.04597. Cited by: 1st item, §2.
  • [19] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy (2018-09) ESRGAN: enhanced super-resolution generative adversarial networks. In The European Conference on Computer Vision (ECCV) Workshops, pp. 63–79. Cited by: 1st item.
  • [20] S. Xu, S. Zeng, and J. Romberg (2019-05) Fast compressive sensing recovery using generative models with structured latent variables. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 2967–2971. External Links: Document, ISSN Cited by: §1.
  • [21] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo, and D. Firmin (2018-06) DAGAN: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Transactions on Medical Imaging 37 (6), pp. 1310–1321. External Links: Document, ISSN Cited by: §1, §2, §3.3, Table 2.
  • [22] Y. Yang, J. Sun, H. Li, and Z. Xu (2016) Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems 29, pp. 10–18. External Links: Link Cited by: §1, §3.3, Table 2.
  • [23] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004-04) Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Document, ISSN Cited by: 3rd item, §2.
  • [24] H. Zhao, O. Gallo, I. Frosio, and J. Kautz (2017-03) Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging 3 (1), pp. 47–57. External Links: Document, ISSN Cited by: 3rd item.