Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

by   Morteza Mardani, et al.

Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off accuracy for speed in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image diagnostic quality. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR images from historical patients. Leveraging a mixture of least-squares (LS) GANs and pixel-wise ℓ_1 cost, a deep residual network with skip connections is trained as the generator that learns to remove the aliasing artifacts by projecting onto the manifold. LSGAN learns the texture details, while ℓ_1 controls the high-frequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feed-forward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrast-enhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixel-wise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than state-of-the-art CS-MRI schemes.


page 7

page 8

page 9

page 11


Robust Compressive Sensing MRI Reconstruction using Generative Adversarial Networks

Compressive sensing magnetic resonance imaging (CS-MRI) accelerates the ...

Efficient Structurally-Strengthened Generative Adversarial Network for MRI Reconstruction

Compressed sensing based magnetic resonance imaging (CS-MRI) provides an...

Recurrent Generative Adversarial Networks for Proximal Learning and Automated Compressive Image Recovery

Recovering images from undersampled linear measurements typically leads ...

SEGAN: Structure-Enhanced Generative Adversarial Network for Compressed Sensing MRI Reconstruction

Generative Adversarial Networks (GANs) are powerful tools for reconstruc...

MoG-QSM: Model-based Generative Adversarial Deep Learning Network for Quantitative Susceptibility Mapping

Quantitative susceptibility mapping (QSM) estimates the underlying tissu...

Image Synthesis in Multi-Contrast MRI with Conditional Generative Adversarial Networks

Acquiring images of the same anatomy with multiple different contrasts i...

Wasserstein GANs for MR Imaging: from Paired to Unpaired Training

Lack of ground-truth MR images (labels) impedes the common supervised tr...

Code Repositories


Compressed Sensing MRI based on Deep Generative Adversarial Network

view repo

1 Introduction

Owing to its superb soft tissue contrast, magnetic resonance imaging (MRI) nowadays serves as the major imaging modality in clinical practice. Real-time MRI visualization is of paramount importance for diagnostic and therapeutic guidance for instance in next generation platforms for MR-guided, minimally invasive neurosurgery [5]. However, the scan is quite slow, taking several minutes to acquire clinically acceptable images. This becomes more pronounced for high-resolution and volumetric images. As a result, the acquisition typically undergoes significant undersampling leading reconstruction to a seriously ill-posed linear inverse problem. To render it well-posed, the conventional compressed-sensing (CS) incorporates the prior image information by means of sparsity regularization in a proper transform domain such as Wavelet (WV), or, Total Variation (TV); see e.g., [23]. This however demands running iterative optimization algorithms that are time and resource intensive. This in turn hinders real-time MRI visualization and analysis.

Recently, a few attempts have been carried out to automate medical image reconstruction by leveraging historical patient data; see e.g., [9, 24]. They train a network that maps the aliased image to the gold-standard one using convolutional neural networks (CNN) with residuals for computed tomography (CT) [24], denoising auto-encoders for MRI [9]. Albeit, speed up, they suffer from blurry and aliasing artifacts. This is mainly due to adopting a pixel-wise / cost that is oblivious of high-frequency texture details, which is crucial for drawing diagnostic decisions. See also the recent DeepADMM scheme in [19] for CS MRI that improves the quality, but it is as slow as the conventional CS. Generative adversarial networks (GANs) have been lately proved very successful in [20, 17] modeling a low-dimensional distribution (manifold) of natural images that are perceptually appealing [12]

. In particular, for image super-resolution tasks GANs achieve state-of-the-art perceptual quality under

upscaling factor for natural images e.g., from ImageNet 

[7, 6]. GANs has also been deployed for image inpaitning [21], style transfer [15], and visual manipulation [12].

Despite the success of GANs for local image restoration such as super-resolution and inpainting, to date, they have not been studied for removing aliasing artifacts in biomedical image reconstruction tasks. This is indeed a more difficult image restoration tasks. In essence, aliasing artifacts (e.g., in MRI) emanate from data undersampling in a different domain (e.g., Fourier, projections) which globally

impact image pixels. Inspired by the high texture quality offered by GANs, and the high contrast of MR images, we employ GANs to learn a low-dimensional manifold of diagnostic-quality MR images. To this end, we train a tandem network of a generator (G) and a discriminator (D), where the generator aims to generate the ground-truth images from the complex-valued aliased ones using a deep residual network (ResNet) with skip connections, with refinement to ensure it is consistent with measurement (data consistency). The aliased input image is simply obtained via inverse Fourier Transform (FT) of undersampled data. D network then scores the G output, using a multilayer convolutional neural network (CNN) that scores one if the image is of diagnostic quality, and, zero if it contains artifacts. For training we adopt a mixture of LSGAN 

[11] and pixel-wise criterion to retrieve high-frequency texture while controlling the noise. We performed evaluations on a large cohort of pediatric patients with contrast-enhanced abdominal images. The retrieved images are rated by expert radiologists for diagnostic quality. Our observations indicate that GANCS results have almost similar quality to the gold-standard fully-sampled images, and are superior in terms of diagnostic quality relative to the existing alternatives including conventional CS (e.g., TV and WV), -, and -based criteria. Moreover, the reconstruction only takes around msec, that is two orders of magnitude faster than state-of-the-art conventional CS toolboxes.

Last but not least, the advocated GANCS scheme tailors inverse imaging tasks appearing in a wide range of applications with budgeted acquisition and reconstruction speed. All in all, relative to the past work this paper’s main contributions are summarized as follows:

  • Propose GANCS as a data-driven regularization scheme for solving ill-posed linear inverse problems that appear in imaging tasks dealing with (global) aliasing artifacts

  • First work to apply GAN as a automated (non-iterative) technique for aliasing artifact suppression in MRI with state-of-the-art image diagnostic quality and reconstruction speed

  • Proposed and evaluated a novel network architecture to achieve better trade-offs between data-consistency (affine projection) and manifold learning

  • Extensive evaluations on a large contrast-enhanced MRI dataset of pediatric patients, with the reconstructed images rated by expert radiologists

The rest of this paper is organized as follows. Section 2 states the problem. Manifold learning using LSGANs is proposed in Section 3. Section 4 also reports the data evaluations, while the conclusions are drawn in Section 5.

2 Problem Statement

Consider an ill-posed linear system with where , and captures the noise and unmodeled dynamics. Suppose the unknown and complex-valued image lies in a low-dimensional manifold, say . No information is known about the manifold besides the training samples drawn from it, and the corresponding (possibly) noisy observations . Given a new observation , the goal is to recover . For instance, in the MRI context motivated for this paper refers to the partial 2D FT that results in undersampled -space data . To retrieve the image, in the first step we learn the manifold . Subsequently, the second step projects the aliased image, obtained via e.g., pseudo inverse onto to discard the artifacts. For the sake of generality, the ensuing is presented for a generic linear map .

3 Manifold Learning via Generative Adversarial Networks

The inverse imaging solution is to find solutions of the intersection between two subspaces defined by acquisition model and image manifold. In order to effectively learn the image manifold from the available (limited number of) training samples we first need to address the following important questions:

  • How to ensure the trained manifold contains plausible images?

  • How to ensure the points on the manifold are data consistent, namely ?

To address the first question we adopt GANs, that have recently proven very successful in estimating prior distribution for images. GANs provide sharp images that are visually plausible 


. In contrast, variational autoencoders 

[7], a important class of generative models, use pixel-wise MSE costs that results in high pick signal-to-noise ratios but often produce overly-smooth images that have poor perceptual quality. Standard GAN consists of a tandem network of G and D networks. Consider the undersampled image as the input to the G network. The G network then projects onto the low-dimensional manifold containing the high-quality images . Let denote the output of G, it then passes through the discriminator network D, that outputs one if , and zero otherwise.

The output of G, namely , however may not be consistent with the data. To tackle this issue, we add another layer after G that projects onto the feasible set of to arrive at . Alternatively, we can add a soft LS penalty when training the G network, as will be seen later in (P1). To further ensure that lies in the intersection of the manifold and the space of data consistent images we can use a mutlilayer network that alternates between residual units and data consistency projection as depicted in Fig. 1 (b). We have observed that using only a couple of residual units may improve the performance of G in discarding the aliasing artifacts. The overall network architecture is depicted in Fig. 1 (a), where signifies projection onto the nullspace of .

Figure 1: (a) GANCS structure for manifold learning, where the dashed module is projection on the feasible set. (b) The multilayer residual blocks (RB) for data consistency.

Training the network in Fig. 1 amounts to playing a game with conflicting objectives between the adversary G and the discriminator D. D network aims to score one the real images drawn from the data distribution , and score zero the rest. G network also aims to map the input images with the distribution to the fake images that fool the D network. Various strategies have been devised to reach the equilibrium. They mostly differ in terms of the cost function adopted for the G and D networks [20], [11]

. The standard GAN uses a sigmoid cross-entropy loss that leads to vanishing gradients which renders the training unstable, and as a result it suffers from sever degrees of mode collapse. In addition, for the generated images classified as the real with high confidence (i.e., large decision variable), no cost is incurred. Hence, the standard GAN tends to pull samples away from the decision boundary, that introduces non-realistic images 

[11]. LSGN instead pulls the generated samples towards the decision boundary by using a LS cost.

One issue with GAN however is that it introduces high frequency noise all over the image. criterion has proven well in discarding the noise from natural images as it does appropriately penalize the low-intensity noise [22]. Accordingly, to reveal fine texture details while discarding noise, we are motivated to adopt a mixture of LSGAN and costs to train the generator. The overall procedure aims to jointly minimize the discriminator cost

and the generator cost

The first LS fitting term in (P1.2) is a soft penalty to ensure the input to D network is data consistent. Parameters and also control the balance between manifold projection, noise suppression and data consistency.

Looking carefully into (P1.2) the generator reconstructs image from the data using an expected regularized-LS estimator, where the regularization is learned form training data via LSGAN and -net. Different from the conventional CS formulation which also optimize the reconstruction with -regularized LS estimation, the entire optimization only happens in training and the generator learned can be directly applied to new samples to achieve fast reconstruction.

As argued in [11], it can be shown that LSGAN game yields minimizing the Pearson- divergence. For (P1) following the same arguments as of the standard GANS in [20] and [11] it can be readily shown that even in the presence of LS data consistency and penalty, the distribution modeled by G network, say , coincides with the true data distribution. This is formally stated next.

Lemma 1. For the noise-free scenario (), suppose D and G have infinite capacity. Then, for a given generator network G, i) the optimal discriminator D is ; and ii) achieves the equilibrium for the game (P1).

Proof. The first part is similar to the one in [11] with the same cost for D. The second part also readily follows as the LS data consistency and penalty are non-negative, and become zero when . Thus, according to Pearson- divergence still bounds (P1.2) objective from below, and is achievable when .

3.1 Stochastic alternating minimization

To train the G and D networks, a mini-batch stochastic alternating minimization scheme is adopted. At -th iteration with the mini-batch training data , assuming that G is fixed, we first update the discriminator by taking a single descent step with momentum along the gradient of D cost, say . Similarly, given the updated , the G network is updated by taking a gradient descent step with momentum along the gradient of G cost, say . The resulting iterations are listed under Algorithm 1, where the gradients , , and

are readily obtained via backpropagation over D and G networks. Also,

refers to the -th output pixel of G network, and picks the -th pixel.

  input .
  initialize at random.
  for  do
     for  do
         S1) Random mini-batch selection
         Sample the mini-batch , and define , , and
         S2) Discriminator update: gradient-descent with momentum along
         S3) Generator update: gradient-descent with momentum along
     end for
  end for
Algorithm 1 Training algorithm using BP based stochastic alternating minimization

4 Experiments

Effectiveness of the novel GANCS scheme is assessed in this section via tests for MRI reconstruction. A single-coil MR acquisition model is considered where for -th patient the acquired -space data abides to . Here, is the 2D FT, and the set indexes the sampled Fourier coefficients. As it is conventionally performed with CS MRI, we select based on a variable density sampling with radial view ordering  that tends to pick low frequency components from the center of -space (see sampling mask in Fig. 4 (left) of the supplementary document). Throughout the test we assume collects only of the Fourier coefficients, and we choose .

Dataset. High contrast abdominal image volumes are acquired for pediatric patients after gadolinium-based contrast enhancement. Each 3D volume includes contains axial slices of size . Axial slices used as input images for training a neural network. patients ( images) are considered for training, and patients ( images) for test. All in vivo scans were acquired at the Stanford’s Lucile Packard Children’s Hospital on a 3T MRI scanner (GE MR750) with voxel resolution mm.

Under this setting, the ensuing parts address the following questions:

Q1. How does the perceptual cost learned by GANCS improve the image quality compared with the pixel-wise and costs?

Q2. How much speed up and quality improvement one can achieve using GANCS relative to conventional CS?

Q3. What MR image features derive the network to learn the manifold and remove the aliasing artifacts?

Q4. How many samples/patients are needed to achieve a reasonable image quality?

4.1 Training and network architecture

The input and output are complex-valued images of the same size and each include two channels for real and imaginary components. The input image is simply generated using inverse 2D FT of the sampled -space, which is severely contaminated by artifacts. Input channels are then convolved with different kernels and added up in the next layer. Note, all network kernels are assumed real-valued. Inspired by super-resolution ideas in [15, 7], and the network architecture in  [4] we adopt a deep residual network for the generator with residual blocks. Each block consists of two convolutional layers with small kernels and

feature maps that are followed by batch normalization and ReLU activation. It then follows by three convolutional layers with map size

, where the first two layers undergo ReLU activation, while the last layer has sigmoid activation to return the output. G network learns the projection onto the manifold while ensuring the data consistency at the same time, where the manifold dimension is controlled by the number of residual blocks and feature maps and the settings of discriminator D network.

To satisfy data consistency term, previous work in the context of image super-resolution [6] used (hard) affine projection after the G network. However, the affine projection drifts away from the manifold landscape. As argued in Section 3, we instead use a multilayer succession of affine projection and convolutional residual units that project back onto the manifold. We can repeat this procedure a few times to ensure lies close to the intersection. This amounts to a soft yet flexible data consistency penalty.

The D network starts from the output of the G network with two channels. It is composed of convolutional layers. In all the layers except the last one, the convolution is followed by batch normalization, and subsequently ReLU activation. No pooling is used. For the first four layers, number of feature maps is doubled from to

, while at the same time convolution with stride

is used to reduce the image resolution. Kernel size is adopted for the first 5 layers, while the last two layers use kernel size . In the last layer, the convolution output is averaged out to form the decision variable for binary classification. No soft-max is used.

Adam optimizer is used with the momentum parameter , mini-batch size , and initial learning rate that is halved every

iterations. Training is performed with TensorFlow interface on a NVIDIA Titan X Pascal GPU, 12GB RAM. We allow

epochs that takes around hours for training. The implementation is available online at [2].

As a figure of merit for image quality assessment we adopt SNR (dB), and SSIM that is defined on a cropped window of size from the center of axial slices. In addition, we asked Radiologists Opinion Score (ROS) regarding the diagnostic quality of images. ROS ranges from (worse) to (excellent) based on the overall images quality in terms of sharpness/blurriness, and appearance of residual artifacts.

4.2 Observations and discussion

Retrieved images by various methods are depicted in Fig. 2 with -fold undersampling of -space. For a random test patient, representative slices from axial, and coronal orientations, respectively, are shown from top to bottom. Columns from left to right also show, respectively, the images reconstructed by zero-filling (ZF), CS-WV, CS-TV, -net, -net, GAN, GANCS with , and the gold-standard (GS). Note, we propose -net and -net using the same network structure and training as in Section 4.1, with only changing the G net cost function in (P1). CS reconstruction is performed using the Berkeley Advanced Reconstruction Toolbox (BART) [18], where the tunning parameters are optimized for the best performance. GANCS, -net and -net are trained with ZF images that apparently contain aliasing artifacts.

Quantitative metrics including the SNR (dB), SSIM, and the reconstruction time (sec) are also reported in Table I. These metrics are averaged out over all axial slices for test patients. As apparent from the magnified regions, GANCS returns the most detailed images with high contrast and texture details that can reveal the small structures. -net images are seen somehow over-smoothed as the cost encourages finding pixel-wise averages of plausible solutions. Also, -net performs better than -net, which was also already reported in a different setting [22], but still not as sharp as GANCS which leverages both -net and GAN. GAN results with also introduces sharp images but noise is still present all over the image. CS-based results are also depicted as the benchmark MR reconstruction scheme nowadays, where evidently introduce blurring artifacts.

CS-based scheme achieve higher SNR and SSIM, but they miss the high frequency textures as evidenced by Fig. 2. In addition, they demands iterative algorithms for solving non-smooth optimization programs that takes a few seconds for reconstruction using the optimized BART toolbox [18]. In contrast, the elapsed time for GANCS is only about msec, which allows reconstructing frames per second, and thus a suitable choice for real-time MRI visualization tasks. Regarding the convergence, we empirically observe faster and more stable training by imposing more weight on the data consistency which restricts the search space for the network weights.

To assess the perceptual quality of resulting images we also asked the opinion of expert radiologists. We normalize the scores so as the gold-standard images are rated excellent (i.e., ROS=). Statistical ROS is evaluated for the image quality, residual artifacts, and image sharpness. It is shown in the bar plot of Fig. 3, which confirms GANCS almost perceptually pleasing as the gold-standard scan. This demonstrates the superior diagnostic quality of GANCS images relative to the other alternatives.

For the sake of completeness, the evolution of different (empirical) costs associated with the generator cost in (P1.2) over batches are also depicted in Fig. 5. It is observed that the data consistency cost and GAN loss tend to improve alternatively to find the distribution at the intersection of manifold and dats consistency space.

Figure 2: Representative coronal (1st row) and axial (3rd row) images for a test patient retrieved by ZF (1st), CS-WV (2nd), -net (3th), -net (4th), GAN (5th), GANCS (6th), and gold-standard (7th).
Figure 3:

Mean and standard deviation of image quality artifacts and blurriness scored by expert radiologists for various reconstruction techniques. Scores

to rate from poor to excellent.
Scheme ZF CS-WV CS-TV -net -net GAN GANCS
Recon. time

Table 1: Average SNR (dB), SSIM, ROS, and reconstruction time (sec) comparison of different schemes under -fold undersampling.
Figure 4: Representaitve -space axial image retrieved by ZF (1st column), CS-WV (2nd), CS-TV (3rd), and GANCS (4th), and gold-standard (5th).
Figure 5: Evolution of different costs contributing in the overall training cost of G network.

Manifold landscape. We visualize what the discriminator learns by showing the feature maps in different layers as heat-maps superimposed on the original images. Since there are several feature maps per layer, we computed the Principle Component maps for each layer and visualize the first dominant ones. Fig.  indicates that after learning from tens of thousands of generated MRI images by the G network and their gold standards including different organs, is able to detect anatomically valuable features. It is observed that the first layers reveal the edges, while the last layers closer to the classification output reveal more regions of interests that include both anatomy and texture details. This observation is consistent with the way expert radiologist inspect the images based on their diagnosis quality.

Figure 6: Heat-map of discriminator feature maps at four layers for four different images. Each 4 row from top to bottom represent the results from one MR image. The first row shows the MR image and the Principle Components of the network features from the first layer. The second row shows an overlay view of the MR image and the heat-map. The third row shows the MR slice image with the Principle Components of the network features from the last layer of discriminator; while the fourth row shows the overlay view of the MR image and the heat-maps.

Performance with different number of patients We also experimented on the number of patients needed for training and achieving good reconstruction quality in the test phase. It is generally valuable for the clinicians how much training data is needed as in the medical applications, patient data is not easily accessible due to privacy concerns. Fig. 7 plots the normalized RMSE on a test set versus the percentage of patients used for training (normalized by the maximum patient number

). Note, the variance differences for different training may be due to the training with fewer samples has better convergence, since we are using the same epoch numbers for all the training cases. More detailed study is the subject of our ongoing research.

Figure 7: Performance changes with different size of dataset used for training (output about 45,300 images)

5 Conclusions and Future Work

This paper caters a novel CS framework that leverages the historical data for faster and more diagnosis-valuable image reconstruction from highly undersampled observations. A low-dimensional manifold is learned where the images are not only sharp and high contrast, but also consistent with both the real MRI data and the acquisition model. To this end, a neural network based on LSGANs is trained that consists of a generator network to map a readily obtainable undersmapled image to the gold-standard one. Experiments based on a large cohort of abdominal MR data, and the evaluations performed by expert radiologists confirm that the GANCS retrieves images with better diagnostic quality in a real-time manner (about msec, more than times faster than state-of-the-art CS MRI toolbox). This achieves a significant speed-up and diagnostic accuracy relative to standard CS MRI. Last but not least, the scope of the novel GANCS goes beyond the MR reconstruction, and tailors other image restoration tasks dealing with aliasing artifacts. There are still important question to address such as using 3D spatial correlations for improved quality imaging, robustifying against patients with abnormalities, and variations in the acquisition model for instance as a result of different sampling strategies.