1 Introduction
MRI data acquisition is inherently slow, and can often exceed 30 minutes. One way to accelerate MR scanning is undersampling the space, i.e., reducing the number of space traversals by a factor , and accelerating the scan proportionately. Reconstruction is then performed by using parallel imaging (PI) or compressed sensing (CS) techniques.
More recently, Deep Neural Networks (DNNs) have been used to push values even higher [3, 13, 17]
. Among the most promising Deep Learning (DL) techniques, the unrolled iterative networks (also called cascading network) have emerged as a leading powerful method
[3, 13]. Inspired by CS, this technique uses a DNN composed of a sequence of iterations that include dataconsistency and convolutional units. The dataconsistency units utilize the acquired space lines as a prior that keeps the network from drifting away from the acquired data, and the convolutional layers are trained to regularize the reconstruction.As with other image generation problems, using a naive pixelwise distance for training DLbased sparse MRI reconstruction models can result in image blurring and unrealistic appearance. In a clinical setting, avoidance of blurring can be crucial for proper diagnosis. Recently, Generative Adversarial Networks (GANs) have been used to promote the naturalness of MRI reconstructions[4, 12, 16]. In our work, we harness the power of conditional Wasserstein GANs (cWGANs) to further improve image quality.
The main contributions of this paper are as follows: (1) We propose a cWGAN method for sparse MRI reconstruction, in which both the generator and discriminator are conditioned using the acquired undersampled data. (2) We introduce a novel training algorithm called Adaptive Gradient Balancing (AGB) which balances the losses in multiterm adversarial objectives. (3) We provide an extensive comparison between different models and training techniques. In particular, we report results of four different techniques  an unrolled iterative network, a WGAN based network, a cWGAN network and a cWGAN network trained with our AGB. (4) We propose and evaluate a novel Densely Connected Iterative Network (DCINet) for sparse MRI reconstruction, which is inspired by DenseNets [6]. (5) We are the first to adopt the Fréchet Inception Distance as a score metric for sparse MRI reconstruction.
Related work DLbased sparse MRI reconstruction has attracted considerable attention recently. Schlemper et al. [13] used a cascade of CNNs optimized to minimize a pixelwise distance. Hammernik et al. proposed Variational Networks (VN) for solving MRIsparse reconstruction: first, a VN that minimizes a pixelwise loss [3], then a GANbased VN [4] to bear on the blurring artifacts. Mardani et al. [12] proposed a GANbased model that uses a deep residual network as a generator. Yang et al. [16] introduced a GANbased model trained to optimize a mixture of a pixelwise loss, a perceptual loss and a GAN loss which conditions only the generator input. Yang et al. reported that a GANbased model without perceptual loss, generates unrealistic jagged artifacts.
2 Problem Formulation
Let be the space signal acquired by an MRI scanner. For a singlecoil receiver, an image
can be estimated by performing an inverse Fourier transform
. In multicoil MRI, an array of coils acquire different 2D space measurements of the same object . Each coil , positioned at a different location, is typically highly sensitive in one region of space. This positiondependent sensitivity can be represented by a complexvalued coil sensitivity map in real space, .During reconstruction, the images from each coil are combined into a fullysampled image , where is a reconstruction function and is the complex conjugate of the sensitivity map of coil . To accelerate imaging, a binary sampling pattern is used to undersample each coil’s space signal for each slice. The undersampled space signal, denoted by , can be written as . The undersampled zerofilled image can be calculated by: . The learning task is to find a reconstruction function
that minimizes an expected loss function
(Sec. 1) over a population of scans: . For a given , and , we will denote by the generated image .3 Method
Our method learns a DLbased sparse MRI reconstruction model from training samples, each of which is a pair of a fully sampled and matched undersampled space data. We propose a conditional GAN architecture, which conditions the reconstruction using the zerofilled image. Specifically, our model is composed of a generator and a discriminator networks. The generator reconstructs an image from an undersampled space dataset. The discriminator receives a pair of input images: (i) a ground truth image or a generated (“fake”) reconstructed image from undersampled space and (ii) a zerofilled image (see Fig. 1).
While it is possible to use a nonconditional GAN architecture, in this case the discriminator can only enforce general style properties learned from the distribution of the fully sampled images, and for a given undersampled space signal, , it is not guaranteed that the generator would learn to reconstruct a realistic image that perceptually matches its corresponding .
Objective Following the success of the Wasserstein GAN (WGAN) [1] and the framework proposed by Isola et al. [8], we adopt a conditional WGAN objective:
(1) 
where and are the generator and discriminator networks, repectively. is a random undersampled space data, is a random fully sampled image, and their corresponding undersampled zerofilled images are and respectively. In addition to the adversarial loss, we also add a pixelwise Mean Square Error (MSE) loss , where W and H are the width and height of the image . The final generator loss is:
(2) 
Adaptive Gradient Balancing In WGAN training, the discriminator network is used as a learned loss function, which dynamically changes during training, and thus may generate gradients with variable norm. To stabilize the WGAN training and to avoid drifting away from the groundtruth spatial information, we introduce the Adaptive Gradient Balancing (AGB) algorithm for continually balancing the gradients of the pixelwise and the WGAN loss functions.
In order to keep the gradients of both terms at the same level, and since the WGAN gradients tend to vary, we choose to adaptively upperbound the WGAN gradients. Specifically, we define to be an adaptive weight that will be used to bound the WGAN loss gradients. We calculate two movingaverage variables and
corresponding to the WGAN loss and the pixelwise loss, respectively. These moving averages capture the standard deviation (STD) of the gradients calculated at every backward step on the generated image, with respect to each one of the losses separately. At every training step, if
for a predefined value, we update and as follows: , , where is a predefined decay rate. During training, we divide the WGAN loss by to carefully decay the WGAN loss gradients to roughly the same order of magnitude as those of the pixelwise loss. Moreover, in order to keep a reasonable ratio between the generator’s WGAN loss gradients and the discriminator loss gradients, we also decay the discriminator loss by the same factor (see Alg. 1).Our AGB algorithm extends WGAN training and ensures one invariant during the entire training  the STD of the WGAN loss gradients is upperbounded by a factor of the STD of the pixelwise loss gradients. This invariant maintains the effectiveness of both loss terms, over the entire course of training.
Network architectures We propose a new generator architecture (Fig. 2), called Densely Connected Iterative Network (DCINet), which is based on the iterative convolutional network [3, 13]. The key new developments are the use of (1) dense connections [6] across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of over 60 convolutional layers, bringing increased capacity. Our generator receives M coils of undersampled space data, and uses N = 20 iterations, each of which includes a dataconsistency unit and a convolutional unit for regularization (Fig. 2B). Dense skiplayer connections between the output of each iteration and the following G iterations – where typically G = 5 – are represented as curved lines in Fig. 2A. This results in an input to each block composed of skip and direct connections concatenated to form a G+1 channel complex image. For our discriminator architecture we use a convolutional “PatchGAN” [10]. More information can be found in Appendix 0.A.
4 Results
Dataset Fully sampled brain MRI datasets (T1, T2, T1FLAIR and T2FLAIR in axial, coronal and sagittal orientations) were acquired with various kspace data sizes and various numbers of coils along with sensitivity maps estimated from separate calibration scans. In total, 2267 slices were acquired, of which 1901 were used to train the networks, 151 for validation and 215 for testing. In addition, during training, we also applied random horizontal flips and rotations (bounded to 20 degrees) to augment the training set. The data were retrospectively downsampled using 12 central lines of kspace and a 1D variabledensity sampling pattern outside the central region, resulting in a net undersampling factor
. As evaluation metrics, we compute both normalized mean square error (NMSE), and the Fréchet Inception Distance (FID)
[5], which is a similarity measure between two datasets that correlates well with human judgment of visual quality and is most often used to evaluate the quality of images generated by GANs. The Adam optimizer is used with a learning rate of 5x for both generator and discriminator networks. For the traditional GAN training,is initialized to 100, after a hyper parameter search conducted on the values 10, 100, 1000. All models performed 600 epochs in
2 weeks of training, and the inference run time is 100ms per slice on a single GPU.Images  NMSE  FID 

ZF  115  173.0 
Wavelets  18.7  138.4 
TV  14.1  117.0 
ARC  18.9  109.0 
cWGANAGB  3.39  18.7 
Experiment  NMSE  FID 

DCINet (5I5G160K)  3.67  20.2 
DCINet (20I1G40K, no dense)  3.46  19.3 
DCINet (20I5G40K)  3.24  19.4 
WGAN  3.71  19.7 
cWGAN  3.61  19.9 
cWGANAGB (proposed)  3.39  18.7 
Images  Sharpness  SNR  Contrast  Artifacts  Overall IQ 

Fully sampled  5.0  3.3  4.0  4.0  4.5 
Baseline (DCINet)  2.3  4.5  4.0  3.8  2.3 
cWGANAGB (proposed)  3.8  3.8  4.0  3.8  3.5 
Comparison with baseline methods We compare on the test set our cWGANAGB to compressed sensing methods using wavelets or Total Variation (TV) [11] and to Autocalibrating Reconstruction for Cartesian imaging (ARC) [2]. As can be seen in Tab. 3, our proposed model produces significantly more accurate reconstructions than the other methods.
Comparing GANs convergance To show the effectiveness of our method, we compared the convergence of our cWGANAGB model to those of cWGAN and WGAN, trained without AGB. During the training phase, FID and NMSE were evaluated on a holdout validation set, for each epoch. As can be seen in Fig. 3, our proposed model converges better, with both scores decreasing significantly faster compared to the other techniques.
Ablation analysis We compare, in Tab. 3, our cWGANAGB with 3 other models: 1) cWGAN, 2) WGAN, and 3) a baseline DCINet for sparse MRI reconstruction without any GAN technique. All models were evaluated with NMSE and FID on the test set. We found that (a) cWGAN and cWGANAGB have better SNR and fewer artifacts than WGAN, (b) cWGANAGB converges much faster than cWGAN and WGAN (see Fig. 3) and performs better in both FID and NMSE measures (Tab. 3) and (c) although cWGANAGB has higher NMSE than the baseline model, it performs better in FID and yields sharper images with more fine details while maintaining a natural image texture (see Fig. 4).
In Tab. 3, we also compare to baseline architectures, demonstrating the effectiveness of our key new architecture developments: (1) dense connections across all iterations, which strengthens feature propagation, making the network more robust, and (2) a relatively deep architecture of 20 iterations, composing more than 60 convolutional layers, which brings an increased capacity. We compared our generator to (1) a similar network without dense connections and (2) a 5iteration based network with a similar number of learned parameters. Employing dense connections significantly improved accuracy, and the use of the deeper network produced 12% lower mean NMSE than a shallower network that had a similar number of learned parameters.
Visual Scoring To assess the perceptual quality of the resulting images we report a visual scoring conducted by four experienced MRI scientists. The same test set was ranked for cWGANAGB, the baseline method and for the fully sampled images. The scoring was performed blindly and the images were randomly shuffled. The studies were taken from a cohort of seven healthy volunteers. Each study contained a full brain scan comprising 2543 slices. For each study, image sharpness, signaltonoise ratio (SNR), contrast, artifacts and overall image quality (IQ) were reported. Tab. 3 shows that cWGANAGB produced significantly sharper images than the baseline network, at the cost of somewhat weaker denoising of the images.
5 Conclusions
We present a novel sparse MRI reconstruction model that employs a cWGAN loss term, and a novel GAN training procedure. By leveraging GANs to their fullest, the method generates sharper images with more fine detail and natural appearance than would otherwise be possible. In addition, dense connections are used to improve the performance of our unrolled iterative generator network. In the context of MRI reconstruction, a GAN based model can raise concerns about hallucination, where image details that do not appear in the ground truth are generated. We found that our method produces significantly less hallucination than other GANs. This may be due to the usage of (1) a pixelwise loss term, that prioritizes reconstruction accuracy, (2) data consistency layers embedded inside the network, (3) a conditional GAN architecture that allows the discriminator to penalize lowfidelity reconstruction and (4) our AGB training, that continuously upper bounds the gradients of the GAN loss. Moreover, we believe our AGB training can be beneficial for any GANbased model employing a multiterm loss objective, especially in the medical domain where there is more variability in the input and less experience in balancing GAN loss terms.
References

[1]
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning. pp. 214–223 (2017)
 [2] Beatty, P., Brau, A., et al.: A method for autocalibrating 2d accelerated volumetric parallel imaging with clinically practical reconstruction times. In: Proceedings of the International Society for Magnetic Resonance in Medicine. vol. 1749 (2007)
 [3] Hammernik, K., Klatzer, T., et al.: Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine (2018)
 [4] Hammernik, K., Kobler, E., et al.: Variational adversarial networks for accelerated mr image reconstruction. In: ISMRMESMRMB (2018)
 [5] Heusel, M., Ramsauer, H., et al.: Gans trained by a two timescale update rule converge to a local nash equilibrium. In: NIPS (2017)
 [6] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
 [7] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

[8]
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Imagetoimage translation with conditional adversarial networks. In: CVPR (2017)
 [9] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
 [10] Li, C., Wand, M.: Precomputed realtime texture synthesis with markovian generative adversarial networks. In: ECCV (2016)
 [11] Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine (2007)
 [12] Mardani, M., Gong, E., et al.: Deep generative adversarial neural networks for compressive sensing mri. transactions on medical imaging (2019)

[13]
Schlemper, J., Caballero, J., et al.: A deep cascade of convolutional neural networks for dynamic mr image reconstruction. transactions on Medical Imaging (2018)
 [14] Tsai, C.M., Nishimura, D.G.: Reduced aliasing artifacts using variabledensity kspace sampling trajectories. Magnetic Resonance in Medicine 43(3), 452–458 (2000)
 [15] Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
 [16] Yang, G., Yu, S., et al.: DAGAN: deep dealiasing generative adversarial networks for fast compressed sensing mri reconstruction. trans. on medical imaging (2018)
 [17] Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domaintransform manifold learning. Nature 555(7697), 487 (2018)
Appendix 0.A Network architectures
0.a.1 Dataconsistency unit
Each dataconsistency unit shades the input image with each coil sensitivity map, transforms the resulting images to space, imposes the sampling mask, calculates the difference relative to acquired space and returns them to the image domain, multiplied by a learned weight (Fig. 5). By utilizing the acquired space data as a prior, the dataconsistency units, embedded as operations inside the network, keep the network from drifting away from the acquired data. For this use, the undersampled space data were also input directly into each iterative block of the network (Fig. 2A,B of the main text).
0.a.2 Convolutional unit
Each convolutional unit (Fig. 2C of the main text) has three sequences consisting of 5x5 convolution, bias, and leakyReLU [15] layers. The output of the final iteration (Fig. 2A of the main text) is (1) compared to the fully sampled reference image to generate a pixelwise loss function, using MSE, and (2) paired with its corresponding zerofilled image and fed into a discriminator network to evaluate WGAN [1] loss.
0.a.3 discriminator architecture
For our discriminator architecture we use a convolutional “PatchGAN” [10]. The discriminator receives a pair of (1) and (2) or
, concatenated as two channels and is able to penalize structure at the scale of image patches, from both channels. The architecture incorporates four convolutional layers with a stride of 2, each followed by batch normalization
[7] and LeakyReLU [15]. The last convolutional layer is flattened and then fed into a linear layer, for which each input value corresponds to a different patch in the input channels. The linear layer outputs a single value, which is used to calculate the discriminator’s WGAN loss.Appendix 0.B Results
0.b.1 Qualitative results
Fig. 6 exhibits more qualitative results of our proposed model, along with the zero filled (ZF) and the fully sampled images. Fig. 7 provides more qualitative results from our ablation study, comparing three different GAN models and a baseline model, where the baseline is our proposed DCINet trained solely to optimize MSE loss.
For the sake of completeness, we provide a qualitative comparison of our proposed model to compressed sensing methods using wavelets or Total Variation (TV) [11] and to Autocalibrating Reconstruction for Cartesian imaging (ARC) [2], as shown in Fig. 8. It can be seen that our proposed method produces higherquality images than baseline methods, both in terms of perceptual quality and reconstruction error.
0.b.2 Implementation Details
Adam optimizer [9] is used with a learning rate of for both generator and discriminator networks, with the momentum parameter
= 0.9. Training is performed with TensorFlow interface on a GeForce GTX TITAN X GPU, 12GB RAM. For the proposed model with AGB training,
is initialized to 10 and increased in multiple steps during training to a value of 370 (see Fig. 9).0.b.3 Model Selection
In this study, we use both NMSE and FID [5] for model selection. Specifically, to select the best model for each experiment, we evaluate FID and NMSE on the validation set for each epoch (see Fig. 3 of the main text). Then, we calculate the mean of both scores per experiment (starting from epoch 200), and normalize each series separately, by subtracting and dividing by their corresponding mean and STD, respectively. The epoch for which the model minimizes the sum of normalized FID and normalized NMSE has been selected for evaluation on the test set.
0.b.4 Sampling Pattern
Our method is applied to accelerated multislice 2D scanning, where space is undersampled in the phaseencode direction using a 1D variabledensity sampling (VDS) pattern [14] whose density decreases linearly between the central and outer regions of space (with a net undersampling factor of four), but with the central 12 lines of space fully sampled (see Fig. 10).