MRI data acquisition is inherently slow, and can often exceed 30 minutes. One way to accelerate MR scanning is undersampling the -space, i.e., reducing the number of -space traversals by a factor , and accelerating the scan proportionately. Reconstruction is then performed by using parallel imaging (PI) or compressed sensing (CS) techniques.
. Among the most promising Deep Learning (DL) techniques, the unrolled iterative networks (also called cascading network) have emerged as a leading powerful method[3, 13]. Inspired by CS, this technique uses a DNN composed of a sequence of iterations that include data-consistency and convolutional units. The data-consistency units utilize the acquired -space lines as a prior that keeps the network from drifting away from the acquired data, and the convolutional layers are trained to regularize the reconstruction.
As with other image generation problems, using a naive pixel-wise distance for training DL-based sparse MRI reconstruction models can result in image blurring and unrealistic appearance. In a clinical setting, avoidance of blurring can be crucial for proper diagnosis. Recently, Generative Adversarial Networks (GANs) have been used to promote the naturalness of MRI reconstructions[4, 12, 16]. In our work, we harness the power of conditional Wasserstein GANs (cWGANs) to further improve image quality.
The main contributions of this paper are as follows: (1) We propose a cWGAN method for sparse MRI reconstruction, in which both the generator and discriminator are conditioned using the acquired undersampled data. (2) We introduce a novel training algorithm called Adaptive Gradient Balancing (AGB) which balances the losses in multi-term adversarial objectives. (3) We provide an extensive comparison between different models and training techniques. In particular, we report results of four different techniques - an unrolled iterative network, a WGAN based network, a cWGAN network and a cWGAN network trained with our AGB. (4) We propose and evaluate a novel Densely Connected Iterative Network (DCI-Net) for sparse MRI reconstruction, which is inspired by Dense-Nets . (5) We are the first to adopt the Fréchet Inception Distance as a score metric for sparse MRI reconstruction.
Related work DL-based sparse MRI reconstruction has attracted considerable attention recently. Schlemper et al.  used a cascade of CNNs optimized to minimize a pixel-wise distance. Hammernik et al. proposed Variational Networks (VN) for solving MRI-sparse reconstruction: first, a VN that minimizes a pixel-wise loss , then a GAN-based VN  to bear on the blurring artifacts. Mardani et al.  proposed a GAN-based model that uses a deep residual network as a generator. Yang et al.  introduced a GAN-based model trained to optimize a mixture of a pixel-wise loss, a perceptual loss and a GAN loss which conditions only the generator input. Yang et al. reported that a GAN-based model without perceptual loss, generates unrealistic jagged artifacts.
2 Problem Formulation
Let be the -space signal acquired by an MRI scanner. For a single-coil receiver, an image. In multi-coil MRI, an array of coils acquire different 2D -space measurements of the same object . Each coil , positioned at a different location, is typically highly sensitive in one region of space. This position-dependent sensitivity can be represented by a complex-valued coil sensitivity map in real space, .
During reconstruction, the images from each coil are combined into a fully-sampled image , where is a reconstruction function and is the complex conjugate of the sensitivity map of coil . To accelerate imaging, a binary sampling pattern is used to undersample each coil’s -space signal for each slice. The undersampled -space signal, denoted by , can be written as . The undersampled zero-filled image can be calculated by: . The learning task is to find a reconstruction function
that minimizes an expected loss function(Sec. 1) over a population of scans: . For a given , and , we will denote by the generated image .
Our method learns a DL-based sparse MRI reconstruction model from training samples, each of which is a pair of a fully sampled and matched undersampled -space data. We propose a conditional GAN architecture, which conditions the reconstruction using the zero-filled image. Specifically, our model is composed of a generator and a discriminator networks. The generator reconstructs an image from an undersampled -space dataset. The discriminator receives a pair of input images: (i) a ground truth image or a generated (“fake”) reconstructed image from undersampled -space and (ii) a zero-filled image (see Fig. 1).
While it is possible to use a non-conditional GAN architecture, in this case the discriminator can only enforce general style properties learned from the distribution of the fully sampled images, and for a given undersampled -space signal, , it is not guaranteed that the generator would learn to reconstruct a realistic image that perceptually matches its corresponding .
where and are the generator and discriminator networks, repectively. is a random undersampled -space data, is a random fully sampled image, and their corresponding undersampled zero-filled images are and respectively. In addition to the adversarial loss, we also add a pixel-wise Mean Square Error (MSE) loss , where W and H are the width and height of the image . The final generator loss is:
Adaptive Gradient Balancing In WGAN training, the discriminator network is used as a learned loss function, which dynamically changes during training, and thus may generate gradients with variable norm. To stabilize the WGAN training and to avoid drifting away from the ground-truth spatial information, we introduce the Adaptive Gradient Balancing (AGB) algorithm for continually balancing the gradients of the pixel-wise and the WGAN loss functions.
In order to keep the gradients of both terms at the same level, and since the WGAN gradients tend to vary, we choose to adaptively upper-bound the WGAN gradients. Specifically, we define to be an adaptive weight that will be used to bound the WGAN loss gradients. We calculate two moving-average variables and
corresponding to the WGAN loss and the pixel-wise loss, respectively. These moving averages capture the standard deviation (STD) of the gradients calculated at every backward step on the generated image, with respect to each one of the losses separately. At every training step, iffor a predefined value, we update and as follows: , , where is a predefined decay rate. During training, we divide the WGAN loss by to carefully decay the WGAN loss gradients to roughly the same order of magnitude as those of the pixel-wise loss. Moreover, in order to keep a reasonable ratio between the generator’s WGAN loss gradients and the discriminator loss gradients, we also decay the discriminator loss by the same factor (see Alg. 1).
Our AGB algorithm extends WGAN training and ensures one invariant during the entire training - the STD of the WGAN loss gradients is upper-bounded by a factor of the STD of the pixel-wise loss gradients. This invariant maintains the effectiveness of both loss terms, over the entire course of training.
Network architectures We propose a new generator architecture (Fig. 2), called Densely Connected Iterative Network (DCI-Net), which is based on the iterative convolutional network [3, 13]. The key new developments are the use of (1) dense connections  across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of over 60 convolutional layers, bringing increased capacity. Our generator receives M coils of undersampled -space data, and uses N = 20 iterations, each of which includes a data-consistency unit and a convolutional unit for regularization (Fig. 2B). Dense skip-layer connections between the output of each iteration and the following G iterations – where typically G = 5 – are represented as curved lines in Fig. 2A. This results in an input to each block composed of skip and direct connections concatenated to form a G+1 channel complex image. For our discriminator architecture we use a convolutional “PatchGAN” . More information can be found in Appendix 0.A.
Dataset Fully sampled brain MRI datasets (T1, T2, T1-FLAIR and T2-FLAIR in axial, coronal and sagittal orientations) were acquired with various k-space data sizes and various numbers of coils along with sensitivity maps estimated from separate calibration scans. In total, 2267 slices were acquired, of which 1901 were used to train the networks, 151 for validation and 215 for testing. In addition, during training, we also applied random horizontal flips and rotations (bounded to 20 degrees) to augment the training set. The data were retrospectively down-sampled using 12 central lines of k-space and a 1D variable-density sampling pattern outside the central region, resulting in a net under-sampling factor
. As evaluation metrics, we compute both normalized mean square error (NMSE), and the Fréchet Inception Distance (FID), which is a similarity measure between two datasets that correlates well with human judgment of visual quality and is most often used to evaluate the quality of images generated by GANs. The Adam optimizer is used with a learning rate of 5x for both generator and discriminator networks. For the traditional GAN training,
is initialized to 100, after a hyper parameter search conducted on the values 10, 100, 1000. All models performed 600 epochs in2 weeks of training, and the inference run time is 100ms per slice on a single GPU.
|DCI-Net (20I-1G-40K, no dense)||3.46||19.3|
Comparison with baseline methods We compare on the test set our cWGAN-AGB to compressed sensing methods using wavelets or Total Variation (TV)  and to Autocalibrating Reconstruction for Cartesian imaging (ARC) . As can be seen in Tab. 3, our proposed model produces significantly more accurate reconstructions than the other methods.
Comparing GANs convergance To show the effectiveness of our method, we compared the convergence of our cWGAN-AGB model to those of cWGAN and WGAN, trained without AGB. During the training phase, FID and NMSE were evaluated on a hold-out validation set, for each epoch. As can be seen in Fig. 3, our proposed model converges better, with both scores decreasing significantly faster compared to the other techniques.
Ablation analysis We compare, in Tab. 3, our cWGAN-AGB with 3 other models: 1) cWGAN, 2) WGAN, and 3) a baseline DCI-Net for sparse MRI reconstruction without any GAN technique. All models were evaluated with NMSE and FID on the test set. We found that (a) cWGAN and cWGAN-AGB have better SNR and fewer artifacts than WGAN, (b) cWGAN-AGB converges much faster than cWGAN and WGAN (see Fig. 3) and performs better in both FID and NMSE measures (Tab. 3) and (c) although cWGAN-AGB has higher NMSE than the baseline model, it performs better in FID and yields sharper images with more fine details while maintaining a natural image texture (see Fig. 4).
In Tab. 3, we also compare to baseline architectures, demonstrating the effectiveness of our key new architecture developments: (1) dense connections across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of 20 iterations, composing more than 60 convolutional layers, which brings an increased capacity. We compared our generator to (1) a similar network without dense connections and (2) a 5-iteration based network with a similar number of learned parameters. Employing dense connections significantly improved accuracy, and the use of the deeper network produced 12% lower mean NMSE than a shallower network that had a similar number of learned parameters.
Visual Scoring To assess the perceptual quality of the resulting images we report a visual scoring conducted by four experienced MRI scientists. The same test set was ranked for cWGAN-AGB, the baseline method and for the fully sampled images. The scoring was performed blindly and the images were randomly shuffled. The studies were taken from a cohort of seven healthy volunteers. Each study contained a full brain scan comprising 25-43 slices. For each study, image sharpness, signal-to-noise ratio (SNR), contrast, artifacts and overall image quality (IQ) were reported. Tab. 3 shows that cWGAN-AGB produced significantly sharper images than the baseline network, at the cost of somewhat weaker denoising of the images.
We present a novel sparse MRI reconstruction model that employs a cWGAN loss term, and a novel GAN training procedure. By leveraging GANs to their fullest, the method generates sharper images with more fine detail and natural appearance than would otherwise be possible. In addition, dense connections are used to improve the performance of our unrolled iterative generator network. In the context of MRI reconstruction, a GAN based model can raise concerns about hallucination, where image details that do not appear in the ground truth are generated. We found that our method produces significantly less hallucination than other GANs. This may be due to the usage of (1) a pixel-wise loss term, that prioritizes reconstruction accuracy, (2) data consistency layers embedded inside the network, (3) a conditional GAN architecture that allows the discriminator to penalize low-fidelity reconstruction and (4) our AGB training, that continuously upper bounds the gradients of the GAN loss. Moreover, we believe our AGB training can be beneficial for any GAN-based model employing a multi-term loss objective, especially in the medical domain where there is more variability in the input and less experience in balancing GAN loss terms.
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning. pp. 214–223 (2017)
-  Beatty, P., Brau, A., et al.: A method for autocalibrating 2-d accelerated volumetric parallel imaging with clinically practical reconstruction times. In: Proceedings of the International Society for Magnetic Resonance in Medicine. vol. 1749 (2007)
-  Hammernik, K., Klatzer, T., et al.: Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine (2018)
-  Hammernik, K., Kobler, E., et al.: Variational adversarial networks for accelerated mr image reconstruction. In: ISMRM-ESMRMB (2018)
-  Heusel, M., Ramsauer, H., et al.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)
-  Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
-  Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
-  Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian generative adversarial networks. In: ECCV (2016)
-  Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine (2007)
-  Mardani, M., Gong, E., et al.: Deep generative adversarial neural networks for compressive sensing mri. transactions on medical imaging (2019)
Schlemper, J., Caballero, J., et al.: A deep cascade of convolutional neural networks for dynamic mr image reconstruction. transactions on Medical Imaging (2018)
-  Tsai, C.M., Nishimura, D.G.: Reduced aliasing artifacts using variable-density k-space sampling trajectories. Magnetic Resonance in Medicine 43(3), 452–458 (2000)
-  Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
-  Yang, G., Yu, S., et al.: DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction. trans. on medical imaging (2018)
-  Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain-transform manifold learning. Nature 555(7697), 487 (2018)
Appendix 0.A Network architectures
0.a.1 Data-consistency unit
Each data-consistency unit shades the input image with each coil sensitivity map, transforms the resulting images to -space, imposes the sampling mask, calculates the difference relative to acquired -space and returns them to the image domain, multiplied by a learned weight (Fig. 5). By utilizing the acquired -space data as a prior, the data-consistency units, embedded as operations inside the network, keep the network from drifting away from the acquired data. For this use, the undersampled -space data were also input directly into each iterative block of the network (Fig. 2A,B of the main text).
0.a.2 Convolutional unit
Each convolutional unit (Fig. 2C of the main text) has three sequences consisting of 5x5 convolution, bias, and leakyReLU  layers. The output of the final iteration (Fig. 2A of the main text) is (1) compared to the fully sampled reference image to generate a pixel-wise loss function, using MSE, and (2) paired with its corresponding zero-filled image and fed into a discriminator network to evaluate WGAN  loss.
0.a.3 discriminator architecture
For our discriminator architecture we use a convolutional “PatchGAN” . The discriminator receives a pair of (1) and (2) or
, concatenated as two channels and is able to penalize structure at the scale of image patches, from both channels. The architecture incorporates four convolutional layers with a stride of 2, each followed by batch normalization and LeakyReLU . The last convolutional layer is flattened and then fed into a linear layer, for which each input value corresponds to a different patch in the input channels. The linear layer outputs a single value, which is used to calculate the discriminator’s WGAN loss.
Appendix 0.B Results
0.b.1 Qualitative results
Fig. 6 exhibits more qualitative results of our proposed model, along with the zero filled (ZF) and the fully sampled images. Fig. 7 provides more qualitative results from our ablation study, comparing three different GAN models and a baseline model, where the baseline is our proposed DCI-Net trained solely to optimize MSE loss.
For the sake of completeness, we provide a qualitative comparison of our proposed model to compressed sensing methods using wavelets or Total Variation (TV)  and to Autocalibrating Reconstruction for Cartesian imaging (ARC) , as shown in Fig. 8. It can be seen that our proposed method produces higher-quality images than baseline methods, both in terms of perceptual quality and reconstruction error.
0.b.2 Implementation Details
Adam optimizer  is used with a learning rate of for both generator and discriminator networks, with the momentum parameter
= 0.9. Training is performed with TensorFlow interface on a GeForce GTX TITAN X GPU, 12GB RAM. For the proposed model with AGB training,is initialized to 10 and increased in multiple steps during training to a value of 370 (see Fig. 9).
0.b.3 Model Selection
In this study, we use both NMSE and FID  for model selection. Specifically, to select the best model for each experiment, we evaluate FID and NMSE on the validation set for each epoch (see Fig. 3 of the main text). Then, we calculate the mean of both scores per experiment (starting from epoch 200), and normalize each series separately, by subtracting and dividing by their corresponding mean and STD, respectively. The epoch for which the model minimizes the sum of normalized FID and normalized NMSE has been selected for evaluation on the test set.
0.b.4 Sampling Pattern
Our method is applied to accelerated multi-slice 2D scanning, where -space is undersampled in the phase-encode direction using a 1D variable-density sampling (VDS) pattern  whose density decreases linearly between the central and outer regions of -space (with a net undersampling factor of four), but with the central 12 lines of -space fully sampled (see Fig. 10).