Conditional WGANs with Adaptive Gradient Balancing for Sparse MRI Reconstruction

05/02/2019 ∙ by Itzik Malkiel, et al. ∙ 14

Recent sparse MRI reconstruction models have used Deep Neural Networks (DNNs) to reconstruct relatively high-quality images from highly undersampled k-space data, enabling much faster MRI scanning. However, these techniques sometimes struggle to reconstruct sharp images that preserve fine detail while maintaining a natural appearance. In this work, we enhance the image quality by using a Conditional Wasserstein Generative Adversarial Network combined with a novel Adaptive Gradient Balancing technique that stabilizes the training and minimizes the degree of artifacts, while maintaining a high-quality reconstruction that produces sharper images than other techniques.



There are no comments yet.


page 7

page 9

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

MRI data acquisition is inherently slow, and can often exceed 30 minutes. One way to accelerate MR scanning is undersampling the -space, i.e., reducing the number of -space traversals by a factor , and accelerating the scan proportionately. Reconstruction is then performed by using parallel imaging (PI) or compressed sensing (CS) techniques.

More recently, Deep Neural Networks (DNNs) have been used to push values even higher [3, 13, 17]

. Among the most promising Deep Learning (DL) techniques, the unrolled iterative networks (also called cascading network) have emerged as a leading powerful method 

[3, 13]. Inspired by CS, this technique uses a DNN composed of a sequence of iterations that include data-consistency and convolutional units. The data-consistency units utilize the acquired -space lines as a prior that keeps the network from drifting away from the acquired data, and the convolutional layers are trained to regularize the reconstruction.

As with other image generation problems, using a naive pixel-wise distance for training DL-based sparse MRI reconstruction models can result in image blurring and unrealistic appearance. In a clinical setting, avoidance of blurring can be crucial for proper diagnosis. Recently, Generative Adversarial Networks (GANs) have been used to promote the naturalness of MRI reconstructions[4, 12, 16]. In our work, we harness the power of conditional Wasserstein GANs (cWGANs) to further improve image quality.

The main contributions of this paper are as follows: (1) We propose a cWGAN method for sparse MRI reconstruction, in which both the generator and discriminator are conditioned using the acquired undersampled data. (2) We introduce a novel training algorithm called Adaptive Gradient Balancing (AGB) which balances the losses in multi-term adversarial objectives. (3) We provide an extensive comparison between different models and training techniques. In particular, we report results of four different techniques - an unrolled iterative network, a WGAN based network, a cWGAN network and a cWGAN network trained with our AGB. (4) We propose and evaluate a novel Densely Connected Iterative Network (DCI-Net) for sparse MRI reconstruction, which is inspired by Dense-Nets [6]. (5) We are the first to adopt the Fréchet Inception Distance as a score metric for sparse MRI reconstruction.

Related work   DL-based sparse MRI reconstruction has attracted considerable attention recently. Schlemper et al. [13] used a cascade of CNNs optimized to minimize a pixel-wise distance. Hammernik et al. proposed Variational Networks (VN) for solving MRI-sparse reconstruction: first, a VN that minimizes a pixel-wise loss [3], then a GAN-based VN  [4] to bear on the blurring artifacts. Mardani et al. [12] proposed a GAN-based model that uses a deep residual network as a generator. Yang et al. [16] introduced a GAN-based model trained to optimize a mixture of a pixel-wise loss, a perceptual loss and a GAN loss which conditions only the generator input. Yang et al. reported that a GAN-based model without perceptual loss, generates unrealistic jagged artifacts.

2 Problem Formulation

Let be the -space signal acquired by an MRI scanner. For a single-coil receiver, an image

can be estimated by performing an inverse Fourier transform

. In multi-coil MRI, an array of coils acquire different 2D -space measurements of the same object . Each coil , positioned at a different location, is typically highly sensitive in one region of space. This position-dependent sensitivity can be represented by a complex-valued coil sensitivity map in real space, .

During reconstruction, the images from each coil are combined into a fully-sampled image , where is a reconstruction function and is the complex conjugate of the sensitivity map of coil . To accelerate imaging, a binary sampling pattern is used to undersample each coil’s -space signal for each slice. The undersampled -space signal, denoted by , can be written as . The undersampled zero-filled image can be calculated by: . The learning task is to find a reconstruction function

that minimizes an expected loss function

(Sec. 1) over a population of scans: . For a given , and , we will denote by the generated image .

3 Method

Our method learns a DL-based sparse MRI reconstruction model from training samples, each of which is a pair of a fully sampled and matched undersampled -space data. We propose a conditional GAN architecture, which conditions the reconstruction using the zero-filled image. Specifically, our model is composed of a generator and a discriminator networks. The generator reconstructs an image from an undersampled -space dataset. The discriminator receives a pair of input images: (i) a ground truth image or a generated (“fake”) reconstructed image from undersampled -space and (ii) a zero-filled image (see Fig. 1).

Figure 1: The generator receives undersampled -space data as input and generates a matched estimated fully-sampled image. The discriminator learns to estimate the Wasserstein Distance between “fake” pairs and “real” pairs.

While it is possible to use a non-conditional GAN architecture, in this case the discriminator can only enforce general style properties learned from the distribution of the fully sampled images, and for a given undersampled -space signal, , it is not guaranteed that the generator would learn to reconstruct a realistic image that perceptually matches its corresponding .

Objective   Following the success of the Wasserstein GAN (WGAN) [1] and the framework proposed by Isola et al. [8], we adopt a conditional WGAN objective:


where and are the generator and discriminator networks, repectively. is a random undersampled -space data, is a random fully sampled image, and their corresponding undersampled zero-filled images are and respectively. In addition to the adversarial loss, we also add a pixel-wise Mean Square Error (MSE) loss , where W and H are the width and height of the image . The final generator loss is:


Adaptive Gradient Balancing   In WGAN training, the discriminator network is used as a learned loss function, which dynamically changes during training, and thus may generate gradients with variable norm. To stabilize the WGAN training and to avoid drifting away from the ground-truth spatial information, we introduce the Adaptive Gradient Balancing (AGB) algorithm for continually balancing the gradients of the pixel-wise and the WGAN loss functions.

In order to keep the gradients of both terms at the same level, and since the WGAN gradients tend to vary, we choose to adaptively upper-bound the WGAN gradients. Specifically, we define to be an adaptive weight that will be used to bound the WGAN loss gradients. We calculate two moving-average variables and

corresponding to the WGAN loss and the pixel-wise loss, respectively. These moving averages capture the standard deviation (STD) of the gradients calculated at every backward step on the generated image, with respect to each one of the losses separately. At every training step, if

for a predefined value, we update and as follows: , , where is a predefined decay rate. During training, we divide the WGAN loss by to carefully decay the WGAN loss gradients to roughly the same order of magnitude as those of the pixel-wise loss. Moreover, in order to keep a reasonable ratio between the generator’s WGAN loss gradients and the discriminator loss gradients, we also decay the discriminator loss by the same factor (see Alg. 1).

   ; ; ;
   for number of training iterations do
       for  = 0, …,  do
           Sample a minibatch {()}
            + Adam(, )  
            clip(, -, )
       end for
       Sample a minibatch {()}
        Adam(, )  
       if   then
       end if
   end for
Algorithm 1 WGAN-AGB training of WGANs. Parameters: , , , , , ,

Our AGB algorithm extends WGAN training and ensures one invariant during the entire training - the STD of the WGAN loss gradients is upper-bounded by a factor of the STD of the pixel-wise loss gradients. This invariant maintains the effectiveness of both loss terms, over the entire course of training.

Network architectures   We propose a new generator architecture (Fig. 2), called Densely Connected Iterative Network (DCI-Net), which is based on the iterative convolutional network [3, 13]. The key new developments are the use of (1) dense connections [6] across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of over 60 convolutional layers, bringing increased capacity. Our generator receives M coils of undersampled -space data, and uses N = 20 iterations, each of which includes a data-consistency unit and a convolutional unit for regularization (Fig. 2B). Dense skip-layer connections between the output of each iteration and the following G iterations – where typically G = 5 – are represented as curved lines in Fig. 2A. This results in an input to each block composed of skip and direct connections concatenated to form a G+1 channel complex image. For our discriminator architecture we use a convolutional “PatchGAN” [10]. More information can be found in Appendix 0.A.

Figure 2: DCI-Net (A) consists of N unrolled iterative blocks, each with dense skip-layer connections (curved arrows) to subsequent blocks. Each iterative block (B) consists of data-consistency (DC) and convolutional (C) units. The convolutional unit operates on all G+1 connections, while DC units operates only on direct connection.

4 Results

Dataset   Fully sampled brain MRI datasets (T1, T2, T1-FLAIR and T2-FLAIR in axial, coronal and sagittal orientations) were acquired with various k-space data sizes and various numbers of coils along with sensitivity maps estimated from separate calibration scans. In total, 2267 slices were acquired, of which 1901 were used to train the networks, 151 for validation and 215 for testing. In addition, during training, we also applied random horizontal flips and rotations (bounded to 20 degrees) to augment the training set. The data were retrospectively down-sampled using 12 central lines of k-space and a 1D variable-density sampling pattern outside the central region, resulting in a net under-sampling factor

. As evaluation metrics, we compute both normalized mean square error (NMSE), and the Fréchet Inception Distance (FID) 

[5], which is a similarity measure between two datasets that correlates well with human judgment of visual quality and is most often used to evaluate the quality of images generated by GANs. The Adam optimizer is used with a learning rate of 5x for both generator and discriminator networks. For the traditional GAN training,

is initialized to 100, after a hyper parameter search conducted on the values 10, 100, 1000. All models performed 600 epochs in

2 weeks of training, and the inference run time is 100ms per slice on a single GPU.

ZF 115 173.0
Wavelets 18.7 138.4
TV 14.1 117.0
ARC 18.9 109.0
cWGAN-AGB 3.39 18.7
Table 2: Evaluation on a holdout test set. The WGAN variants all employ a generator with 20 iterations (20I), a growth rate of 5 (5G) and 40 kernels for each convolution (40K).
Experiment NMSE FID
DCI-Net (5I-5G-160K) 3.67 20.2
DCI-Net (20I-1G-40K, no dense) 3.46 19.3
DCI-Net (20I-5G-40K) 3.24 19.4
WGAN 3.71 19.7
cWGAN 3.61 19.9
cWGAN-AGB (proposed) 3.39 18.7
Table 3: Mean of sharpness, SNR, contrast, artifacts and overall IQ scored for our proposed cWGAN-AGB, a baseline DCI-Net and the fully-sampled images. Scores 1 to 5 indicate poor to excellent.
Images Sharpness SNR Contrast Artifacts Overall IQ
Fully sampled 5.0 3.3 4.0 4.0 4.5
Baseline (DCI-Net) 2.3 4.5 4.0 3.8 2.3
cWGAN-AGB (proposed) 3.8 3.8 4.0 3.8 3.5
Table 1: Comparison of our method with zero-filled images (ZF), and reconstruction using wavelets or total variation (TV) [11] and ARC [2]. NMSE is w.r.t fully sampled image.

Comparison with baseline methods   We compare on the test set our cWGAN-AGB to compressed sensing methods using wavelets or Total Variation (TV)  [11] and to Autocalibrating Reconstruction for Cartesian imaging (ARC)  [2]. As can be seen in Tab. 3, our proposed model produces significantly more accurate reconstructions than the other methods.

Comparing GANs convergance   To show the effectiveness of our method, we compared the convergence of our cWGAN-AGB model to those of cWGAN and WGAN, trained without AGB. During the training phase, FID and NMSE were evaluated on a hold-out validation set, for each epoch. As can be seen in Fig. 3, our proposed model converges better, with both scores decreasing significantly faster compared to the other techniques.

Figure 3: FID and NMSE (lower is better) during training, as evaluated on the validation set. Results are shown for WGAN, a vanilla cWGAN, and our adative cWGAN-AGB.

Ablation analysis   We compare, in Tab. 3, our cWGAN-AGB with 3 other models: 1) cWGAN, 2) WGAN, and 3) a baseline DCI-Net for sparse MRI reconstruction without any GAN technique. All models were evaluated with NMSE and FID on the test set. We found that (a) cWGAN and cWGAN-AGB have better SNR and fewer artifacts than WGAN, (b) cWGAN-AGB converges much faster than cWGAN and WGAN (see Fig. 3) and performs better in both FID and NMSE measures (Tab. 3) and (c) although cWGAN-AGB has higher NMSE than the baseline model, it performs better in FID and yields sharper images with more fine details while maintaining a natural image texture (see Fig. 4).

Figure 4: A representative example with regions of interest showing the reconstruction of all models side-by-side: a) cWGAN-AGB; b) ground-truth (fully-sampled) image; c) zero-filled image; d) baseline generator network; e) WGAN; f) cWGAN. cWGAN and cWGAN-AGB have better SNR and fewer artifacts than WGAN. cWGAN-AGB yields sharper images with more fine details while maintaining a more natural appearance. The baseline model sometimes exhibits some blurring.

In Tab. 3, we also compare to baseline architectures, demonstrating the effectiveness of our key new architecture developments: (1) dense connections across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of 20 iterations, composing more than 60 convolutional layers, which brings an increased capacity. We compared our generator to (1) a similar network without dense connections and (2) a 5-iteration based network with a similar number of learned parameters. Employing dense connections significantly improved accuracy, and the use of the deeper network produced 12% lower mean NMSE than a shallower network that had a similar number of learned parameters.

Visual Scoring   To assess the perceptual quality of the resulting images we report a visual scoring conducted by four experienced MRI scientists. The same test set was ranked for cWGAN-AGB, the baseline method and for the fully sampled images. The scoring was performed blindly and the images were randomly shuffled. The studies were taken from a cohort of seven healthy volunteers. Each study contained a full brain scan comprising 25-43 slices. For each study, image sharpness, signal-to-noise ratio (SNR), contrast, artifacts and overall image quality (IQ) were reported. Tab. 3 shows that cWGAN-AGB produced significantly sharper images than the baseline network, at the cost of somewhat weaker denoising of the images.

5 Conclusions

We present a novel sparse MRI reconstruction model that employs a cWGAN loss term, and a novel GAN training procedure. By leveraging GANs to their fullest, the method generates sharper images with more fine detail and natural appearance than would otherwise be possible. In addition, dense connections are used to improve the performance of our unrolled iterative generator network. In the context of MRI reconstruction, a GAN based model can raise concerns about hallucination, where image details that do not appear in the ground truth are generated. We found that our method produces significantly less hallucination than other GANs. This may be due to the usage of (1) a pixel-wise loss term, that prioritizes reconstruction accuracy, (2) data consistency layers embedded inside the network, (3) a conditional GAN architecture that allows the discriminator to penalize low-fidelity reconstruction and (4) our AGB training, that continuously upper bounds the gradients of the GAN loss. Moreover, we believe our AGB training can be beneficial for any GAN-based model employing a multi-term loss objective, especially in the medical domain where there is more variability in the input and less experience in balancing GAN loss terms.


  • [1]

    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning. pp. 214–223 (2017)

  • [2] Beatty, P., Brau, A., et al.: A method for autocalibrating 2-d accelerated volumetric parallel imaging with clinically practical reconstruction times. In: Proceedings of the International Society for Magnetic Resonance in Medicine. vol. 1749 (2007)
  • [3] Hammernik, K., Klatzer, T., et al.: Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine (2018)
  • [4] Hammernik, K., Kobler, E., et al.: Variational adversarial networks for accelerated mr image reconstruction. In: ISMRM-ESMRMB (2018)
  • [5] Heusel, M., Ramsauer, H., et al.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)
  • [6] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
  • [7] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  • [8]

    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

  • [9] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [10] Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian generative adversarial networks. In: ECCV (2016)
  • [11] Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine (2007)
  • [12] Mardani, M., Gong, E., et al.: Deep generative adversarial neural networks for compressive sensing mri. transactions on medical imaging (2019)
  • [13]

    Schlemper, J., Caballero, J., et al.: A deep cascade of convolutional neural networks for dynamic mr image reconstruction. transactions on Medical Imaging (2018)

  • [14] Tsai, C.M., Nishimura, D.G.: Reduced aliasing artifacts using variable-density k-space sampling trajectories. Magnetic Resonance in Medicine 43(3), 452–458 (2000)
  • [15] Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
  • [16] Yang, G., Yu, S., et al.: DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction. trans. on medical imaging (2018)
  • [17] Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain-transform manifold learning. Nature 555(7697),  487 (2018)

Appendix 0.A Network architectures

0.a.1 Data-consistency unit

Each data-consistency unit shades the input image with each coil sensitivity map, transforms the resulting images to -space, imposes the sampling mask, calculates the difference relative to acquired -space and returns them to the image domain, multiplied by a learned weight (Fig. 5). By utilizing the acquired -space data as a prior, the data-consistency units, embedded as operations inside the network, keep the network from drifting away from the acquired data. For this use, the undersampled -space data were also input directly into each iterative block of the network (Fig. 2A,B of the main text).

Figure 5: Data Consistency (DC) unit. Each iteration contains a DC unit that operates only on the iteration’s direct input image. The DC calculates the inconsistencies between (i) the undersampled -space of the iteration’s input image and (ii) the acquired -space. By using Inverse Fourier Transform, the calculated inconsistency is transformed to image space, then multiplied by a learned weight and subtracted from the iteration’s input image (not shown in the figure).

0.a.2 Convolutional unit

Each convolutional unit (Fig. 2C of the main text) has three sequences consisting of 5x5 convolution, bias, and leakyReLU [15] layers. The output of the final iteration (Fig. 2A of the main text) is (1) compared to the fully sampled reference image to generate a pixel-wise loss function, using MSE, and (2) paired with its corresponding zero-filled image and fed into a discriminator network to evaluate WGAN  [1] loss.

0.a.3 discriminator architecture

For our discriminator architecture we use a convolutional “PatchGAN” [10]. The discriminator receives a pair of (1) and (2) or

, concatenated as two channels and is able to penalize structure at the scale of image patches, from both channels. The architecture incorporates four convolutional layers with a stride of 2, each followed by batch normalization

[7] and LeakyReLU [15]. The last convolutional layer is flattened and then fed into a linear layer, for which each input value corresponds to a different patch in the input channels. The linear layer outputs a single value, which is used to calculate the discriminator’s WGAN loss.

Appendix 0.B Results

0.b.1 Qualitative results

Fig. 6 exhibits more qualitative results of our proposed model, along with the zero filled (ZF) and the fully sampled images. Fig. 7 provides more qualitative results from our ablation study, comparing three different GAN models and a baseline model, where the baseline is our proposed DCI-Net trained solely to optimize MSE loss.

Figure 6: More results of our approach. Left to right: fully sampled, proposed method, zero filled images (R=4).
Figure 7: More qualitative results from our ablation analysis.

For the sake of completeness, we provide a qualitative comparison of our proposed model to compressed sensing methods using wavelets or Total Variation (TV)  [11] and to Autocalibrating Reconstruction for Cartesian imaging (ARC)  [2], as shown in Fig. 8. It can be seen that our proposed method produces higher-quality images than baseline methods, both in terms of perceptual quality and reconstruction error.

Figure 8: Comparison with baseline methods. Left to right: fully sampled, proposed method, wavelets, Total Variation, ARC, zero filled.

0.b.2 Implementation Details

Adam optimizer [9] is used with a learning rate of for both generator and discriminator networks, with the momentum parameter

= 0.9. Training is performed with TensorFlow interface on a GeForce GTX TITAN X GPU, 12GB RAM. For the proposed model with AGB training,

is initialized to 10 and increased in multiple steps during training to a value of 370 (see Fig. 9).

Figure 9: Beta value calculated per epoch, for our proposed cWGAN-AGB model.

0.b.3 Model Selection

In this study, we use both NMSE and FID [5] for model selection. Specifically, to select the best model for each experiment, we evaluate FID and NMSE on the validation set for each epoch (see Fig. 3 of the main text). Then, we calculate the mean of both scores per experiment (starting from epoch 200), and normalize each series separately, by subtracting and dividing by their corresponding mean and STD, respectively. The epoch for which the model minimizes the sum of normalized FID and normalized NMSE has been selected for evaluation on the test set.

0.b.4 Sampling Pattern

Our method is applied to accelerated multi-slice 2D scanning, where -space is undersampled in the phase-encode direction using a 1D variable-density sampling (VDS) pattern  [14] whose density decreases linearly between the central and outer regions of -space (with a net undersampling factor of four), but with the central 12 lines of -space fully sampled (see Fig. 10).

Figure 10: Fully-sampled -space multiplied by an acquisition sampling pattern, with acceleration factor of 4, results in highly undersampled -space (top right). Conventional reconstruction of the undersampled -space using zero-filling generates a low-quality image with heavy artifacts that is completely non-diagnostic (bottom right). A non-accelerated acquisition that uses fully-sampled -space results in high-quality image (bottom left). In this study, we focus on 2D data acquisition, which utilizes a 1D sampling pattern in the phase-encoding direction.