GANCS
Compressed Sensing MRI based on Deep Generative Adversarial Network
view repo
Magnetic resonance image (MRI) reconstruction is a severely illposed linear inverse task demanding time and resource intensive computations that can substantially trade off accuracy for speed in realtime imaging. In addition, stateoftheart compressed sensing (CS) analytics are not cognizant of the image diagnostic quality. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (lowdimensional) manifold of diagnosticquality MR images from historical patients. Leveraging a mixture of leastsquares (LS) GANs and pixelwise ℓ_1 cost, a deep residual network with skip connections is trained as the generator that learns to remove the aliasing artifacts by projecting onto the manifold. LSGAN learns the texture details, while ℓ_1 controls the highfrequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feedforward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrastenhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixelwise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than stateoftheart CSMRI schemes.
READ FULL TEXT VIEW PDFCompressed Sensing MRI based on Deep Generative Adversarial Network
Owing to its superb soft tissue contrast, magnetic resonance imaging (MRI) nowadays serves as the major imaging modality in clinical practice. Realtime MRI visualization is of paramount importance for diagnostic and therapeutic guidance for instance in next generation platforms for MRguided, minimally invasive neurosurgery [5]. However, the scan is quite slow, taking several minutes to acquire clinically acceptable images. This becomes more pronounced for highresolution and volumetric images. As a result, the acquisition typically undergoes significant undersampling leading reconstruction to a seriously illposed linear inverse problem. To render it wellposed, the conventional compressedsensing (CS) incorporates the prior image information by means of sparsity regularization in a proper transform domain such as Wavelet (WV), or, Total Variation (TV); see e.g., [23]. This however demands running iterative optimization algorithms that are time and resource intensive. This in turn hinders realtime MRI visualization and analysis.
Recently, a few attempts have been carried out to automate medical image reconstruction by leveraging historical patient data; see e.g., [9, 24]. They train a network that maps the aliased image to the goldstandard one using convolutional neural networks (CNN) with residuals for computed tomography (CT) [24], denoising autoencoders for MRI [9]. Albeit, speed up, they suffer from blurry and aliasing artifacts. This is mainly due to adopting a pixelwise / cost that is oblivious of highfrequency texture details, which is crucial for drawing diagnostic decisions. See also the recent DeepADMM scheme in [19] for CS MRI that improves the quality, but it is as slow as the conventional CS. Generative adversarial networks (GANs) have been lately proved very successful in [20, 17] modeling a lowdimensional distribution (manifold) of natural images that are perceptually appealing [12]
. In particular, for image superresolution tasks GANs achieve stateoftheart perceptual quality under
upscaling factor for natural images e.g., from ImageNet
[7, 6]. GANs has also been deployed for image inpaitning [21], style transfer [15], and visual manipulation [12].Despite the success of GANs for local image restoration such as superresolution and inpainting, to date, they have not been studied for removing aliasing artifacts in biomedical image reconstruction tasks. This is indeed a more difficult image restoration tasks. In essence, aliasing artifacts (e.g., in MRI) emanate from data undersampling in a different domain (e.g., Fourier, projections) which globally
impact image pixels. Inspired by the high texture quality offered by GANs, and the high contrast of MR images, we employ GANs to learn a lowdimensional manifold of diagnosticquality MR images. To this end, we train a tandem network of a generator (G) and a discriminator (D), where the generator aims to generate the groundtruth images from the complexvalued aliased ones using a deep residual network (ResNet) with skip connections, with refinement to ensure it is consistent with measurement (data consistency). The aliased input image is simply obtained via inverse Fourier Transform (FT) of undersampled data. D network then scores the G output, using a multilayer convolutional neural network (CNN) that scores one if the image is of diagnostic quality, and, zero if it contains artifacts. For training we adopt a mixture of LSGAN
[11] and pixelwise criterion to retrieve highfrequency texture while controlling the noise. We performed evaluations on a large cohort of pediatric patients with contrastenhanced abdominal images. The retrieved images are rated by expert radiologists for diagnostic quality. Our observations indicate that GANCS results have almost similar quality to the goldstandard fullysampled images, and are superior in terms of diagnostic quality relative to the existing alternatives including conventional CS (e.g., TV and WV), , and based criteria. Moreover, the reconstruction only takes around msec, that is two orders of magnitude faster than stateoftheart conventional CS toolboxes.Last but not least, the advocated GANCS scheme tailors inverse imaging tasks appearing in a wide range of applications with budgeted acquisition and reconstruction speed. All in all, relative to the past work this paper’s main contributions are summarized as follows:
Propose GANCS as a datadriven regularization scheme for solving illposed linear inverse problems that appear in imaging tasks dealing with (global) aliasing artifacts
First work to apply GAN as a automated (noniterative) technique for aliasing artifact suppression in MRI with stateoftheart image diagnostic quality and reconstruction speed
Proposed and evaluated a novel network architecture to achieve better tradeoffs between dataconsistency (affine projection) and manifold learning
Extensive evaluations on a large contrastenhanced MRI dataset of pediatric patients, with the reconstructed images rated by expert radiologists
The rest of this paper is organized as follows. Section 2 states the problem. Manifold learning using LSGANs is proposed in Section 3. Section 4 also reports the data evaluations, while the conclusions are drawn in Section 5.
Consider an illposed linear system with where , and captures the noise and unmodeled dynamics. Suppose the unknown and complexvalued image lies in a lowdimensional manifold, say . No information is known about the manifold besides the training samples drawn from it, and the corresponding (possibly) noisy observations . Given a new observation , the goal is to recover . For instance, in the MRI context motivated for this paper refers to the partial 2D FT that results in undersampled space data . To retrieve the image, in the first step we learn the manifold . Subsequently, the second step projects the aliased image, obtained via e.g., pseudo inverse onto to discard the artifacts. For the sake of generality, the ensuing is presented for a generic linear map .
The inverse imaging solution is to find solutions of the intersection between two subspaces defined by acquisition model and image manifold. In order to effectively learn the image manifold from the available (limited number of) training samples we first need to address the following important questions:
How to ensure the trained manifold contains plausible images?
How to ensure the points on the manifold are data consistent, namely ?
To address the first question we adopt GANs, that have recently proven very successful in estimating prior distribution for images. GANs provide sharp images that are visually plausible
[20]. In contrast, variational autoencoders
[7], a important class of generative models, use pixelwise MSE costs that results in high pick signaltonoise ratios but often produce overlysmooth images that have poor perceptual quality. Standard GAN consists of a tandem network of G and D networks. Consider the undersampled image as the input to the G network. The G network then projects onto the lowdimensional manifold containing the highquality images . Let denote the output of G, it then passes through the discriminator network D, that outputs one if , and zero otherwise.The output of G, namely , however may not be consistent with the data. To tackle this issue, we add another layer after G that projects onto the feasible set of to arrive at . Alternatively, we can add a soft LS penalty when training the G network, as will be seen later in (P1). To further ensure that lies in the intersection of the manifold and the space of data consistent images we can use a mutlilayer network that alternates between residual units and data consistency projection as depicted in Fig. 1 (b). We have observed that using only a couple of residual units may improve the performance of G in discarding the aliasing artifacts. The overall network architecture is depicted in Fig. 1 (a), where signifies projection onto the nullspace of .
Training the network in Fig. 1 amounts to playing a game with conflicting objectives between the adversary G and the discriminator D. D network aims to score one the real images drawn from the data distribution , and score zero the rest. G network also aims to map the input images with the distribution to the fake images that fool the D network. Various strategies have been devised to reach the equilibrium. They mostly differ in terms of the cost function adopted for the G and D networks [20], [11]
. The standard GAN uses a sigmoid crossentropy loss that leads to vanishing gradients which renders the training unstable, and as a result it suffers from sever degrees of mode collapse. In addition, for the generated images classified as the real with high confidence (i.e., large decision variable), no cost is incurred. Hence, the standard GAN tends to pull samples away from the decision boundary, that introduces nonrealistic images
[11]. LSGN instead pulls the generated samples towards the decision boundary by using a LS cost.One issue with GAN however is that it introduces high frequency noise all over the image. criterion has proven well in discarding the noise from natural images as it does appropriately penalize the lowintensity noise [22]. Accordingly, to reveal fine texture details while discarding noise, we are motivated to adopt a mixture of LSGAN and costs to train the generator. The overall procedure aims to jointly minimize the discriminator cost
and the generator cost
The first LS fitting term in (P1.2) is a soft penalty to ensure the input to D network is data consistent. Parameters and also control the balance between manifold projection, noise suppression and data consistency.
Looking carefully into (P1.2) the generator reconstructs image from the data using an expected regularizedLS estimator, where the regularization is learned form training data via LSGAN and net. Different from the conventional CS formulation which also optimize the reconstruction with regularized LS estimation, the entire optimization only happens in training and the generator learned can be directly applied to new samples to achieve fast reconstruction.
As argued in [11], it can be shown that LSGAN game yields minimizing the Pearson divergence. For (P1) following the same arguments as of the standard GANS in [20] and [11] it can be readily shown that even in the presence of LS data consistency and penalty, the distribution modeled by G network, say , coincides with the true data distribution. This is formally stated next.
Lemma 1. For the noisefree scenario (), suppose D and G have infinite capacity. Then, for a given generator network G, i) the optimal discriminator D is ; and ii) achieves the equilibrium for the game (P1).
Proof. The first part is similar to the one in [11] with the same cost for D. The second part also readily follows as the LS data consistency and penalty are nonnegative, and become zero when . Thus, according to Pearson divergence still bounds (P1.2) objective from below, and is achievable when .
To train the G and D networks, a minibatch stochastic alternating minimization scheme is adopted. At th iteration with the minibatch training data , assuming that G is fixed, we first update the discriminator by taking a single descent step with momentum along the gradient of D cost, say . Similarly, given the updated , the G network is updated by taking a gradient descent step with momentum along the gradient of G cost, say . The resulting iterations are listed under Algorithm 1, where the gradients , , and
are readily obtained via backpropagation over D and G networks. Also,
refers to the th output pixel of G network, and picks the th pixel.Effectiveness of the novel GANCS scheme is assessed in this section via tests for MRI reconstruction. A singlecoil MR acquisition model is considered where for th patient the acquired space data abides to . Here, is the 2D FT, and the set indexes the sampled Fourier coefficients. As it is conventionally performed with CS MRI, we select based on a variable density sampling with radial view ordering that tends to pick low frequency components from the center of space (see sampling mask in Fig. 4 (left) of the supplementary document). Throughout the test we assume collects only of the Fourier coefficients, and we choose .
Dataset. High contrast abdominal image volumes are acquired for pediatric patients after gadoliniumbased contrast enhancement. Each 3D volume includes contains axial slices of size . Axial slices used as input images for training a neural network. patients ( images) are considered for training, and patients ( images) for test. All in vivo scans were acquired at the Stanford’s Lucile Packard Children’s Hospital on a 3T MRI scanner (GE MR750) with voxel resolution mm.
Under this setting, the ensuing parts address the following questions:
Q1. How does the perceptual cost learned by GANCS improve the image quality compared with the pixelwise and costs?
Q2. How much speed up and quality improvement one can achieve using GANCS relative to conventional CS?
Q3. What MR image features derive the network to learn the manifold and remove the aliasing artifacts?
Q4. How many samples/patients are needed to achieve a reasonable image quality?
The input and output are complexvalued images of the same size and each include two channels for real and imaginary components. The input image is simply generated using inverse 2D FT of the sampled space, which is severely contaminated by artifacts. Input channels are then convolved with different kernels and added up in the next layer. Note, all network kernels are assumed realvalued. Inspired by superresolution ideas in [15, 7], and the network architecture in [4] we adopt a deep residual network for the generator with residual blocks. Each block consists of two convolutional layers with small kernels and
feature maps that are followed by batch normalization and ReLU activation. It then follows by three convolutional layers with map size
, where the first two layers undergo ReLU activation, while the last layer has sigmoid activation to return the output. G network learns the projection onto the manifold while ensuring the data consistency at the same time, where the manifold dimension is controlled by the number of residual blocks and feature maps and the settings of discriminator D network.To satisfy data consistency term, previous work in the context of image superresolution [6] used (hard) affine projection after the G network. However, the affine projection drifts away from the manifold landscape. As argued in Section 3, we instead use a multilayer succession of affine projection and convolutional residual units that project back onto the manifold. We can repeat this procedure a few times to ensure lies close to the intersection. This amounts to a soft yet flexible data consistency penalty.
The D network starts from the output of the G network with two channels. It is composed of convolutional layers. In all the layers except the last one, the convolution is followed by batch normalization, and subsequently ReLU activation. No pooling is used. For the first four layers, number of feature maps is doubled from to
, while at the same time convolution with stride
is used to reduce the image resolution. Kernel size is adopted for the first 5 layers, while the last two layers use kernel size . In the last layer, the convolution output is averaged out to form the decision variable for binary classification. No softmax is used.Adam optimizer is used with the momentum parameter , minibatch size , and initial learning rate that is halved every
iterations. Training is performed with TensorFlow interface on a NVIDIA Titan X Pascal GPU, 12GB RAM. We allow
epochs that takes around hours for training. The implementation is available online at [2].As a figure of merit for image quality assessment we adopt SNR (dB), and SSIM that is defined on a cropped window of size from the center of axial slices. In addition, we asked Radiologists Opinion Score (ROS) regarding the diagnostic quality of images. ROS ranges from (worse) to (excellent) based on the overall images quality in terms of sharpness/blurriness, and appearance of residual artifacts.
Retrieved images by various methods are depicted in Fig. 2 with fold undersampling of space. For a random test patient, representative slices from axial, and coronal orientations, respectively, are shown from top to bottom. Columns from left to right also show, respectively, the images reconstructed by zerofilling (ZF), CSWV, CSTV, net, net, GAN, GANCS with , and the goldstandard (GS). Note, we propose net and net using the same network structure and training as in Section 4.1, with only changing the G net cost function in (P1). CS reconstruction is performed using the Berkeley Advanced Reconstruction Toolbox (BART) [18], where the tunning parameters are optimized for the best performance. GANCS, net and net are trained with ZF images that apparently contain aliasing artifacts.
Quantitative metrics including the SNR (dB), SSIM, and the reconstruction time (sec) are also reported in Table I. These metrics are averaged out over all axial slices for test patients. As apparent from the magnified regions, GANCS returns the most detailed images with high contrast and texture details that can reveal the small structures. net images are seen somehow oversmoothed as the cost encourages finding pixelwise averages of plausible solutions. Also, net performs better than net, which was also already reported in a different setting [22], but still not as sharp as GANCS which leverages both net and GAN. GAN results with also introduces sharp images but noise is still present all over the image. CSbased results are also depicted as the benchmark MR reconstruction scheme nowadays, where evidently introduce blurring artifacts.
CSbased scheme achieve higher SNR and SSIM, but they miss the high frequency textures as evidenced by Fig. 2. In addition, they demands iterative algorithms for solving nonsmooth optimization programs that takes a few seconds for reconstruction using the optimized BART toolbox [18]. In contrast, the elapsed time for GANCS is only about msec, which allows reconstructing frames per second, and thus a suitable choice for realtime MRI visualization tasks. Regarding the convergence, we empirically observe faster and more stable training by imposing more weight on the data consistency which restricts the search space for the network weights.
To assess the perceptual quality of resulting images we also asked the opinion of expert radiologists. We normalize the scores so as the goldstandard images are rated excellent (i.e., ROS=). Statistical ROS is evaluated for the image quality, residual artifacts, and image sharpness. It is shown in the bar plot of Fig. 3, which confirms GANCS almost perceptually pleasing as the goldstandard scan. This demonstrates the superior diagnostic quality of GANCS images relative to the other alternatives.
For the sake of completeness, the evolution of different (empirical) costs associated with the generator cost in (P1.2) over batches are also depicted in Fig. 5. It is observed that the data consistency cost and GAN loss tend to improve alternatively to find the distribution at the intersection of manifold and dats consistency space.
Mean and standard deviation of image quality artifacts and blurriness scored by expert radiologists for various reconstruction techniques. Scores
to rate from poor to excellent.Scheme  ZF  CSWV  CSTV  net  net  GAN  GANCS 

SNR  
SSIM  
Recon. time  

Manifold landscape. We visualize what the discriminator learns by showing the feature maps in different layers as heatmaps superimposed on the original images. Since there are several feature maps per layer, we computed the Principle Component maps for each layer and visualize the first dominant ones. Fig. indicates that after learning from tens of thousands of generated MRI images by the G network and their gold standards including different organs, is able to detect anatomically valuable features. It is observed that the first layers reveal the edges, while the last layers closer to the classification output reveal more regions of interests that include both anatomy and texture details. This observation is consistent with the way expert radiologist inspect the images based on their diagnosis quality.
Performance with different number of patients We also experimented on the number of patients needed for training and achieving good reconstruction quality in the test phase. It is generally valuable for the clinicians how much training data is needed as in the medical applications, patient data is not easily accessible due to privacy concerns. Fig. 7 plots the normalized RMSE on a test set versus the percentage of patients used for training (normalized by the maximum patient number
). Note, the variance differences for different training may be due to the training with fewer samples has better convergence, since we are using the same epoch numbers for all the training cases. More detailed study is the subject of our ongoing research.
This paper caters a novel CS framework that leverages the historical data for faster and more diagnosisvaluable image reconstruction from highly undersampled observations. A lowdimensional manifold is learned where the images are not only sharp and high contrast, but also consistent with both the real MRI data and the acquisition model. To this end, a neural network based on LSGANs is trained that consists of a generator network to map a readily obtainable undersmapled image to the goldstandard one. Experiments based on a large cohort of abdominal MR data, and the evaluations performed by expert radiologists confirm that the GANCS retrieves images with better diagnostic quality in a realtime manner (about msec, more than times faster than stateoftheart CS MRI toolbox). This achieves a significant speedup and diagnostic accuracy relative to standard CS MRI. Last but not least, the scope of the novel GANCS goes beyond the MR reconstruction, and tailors other image restoration tasks dealing with aliasing artifacts. There are still important question to address such as using 3D spatial correlations for improved quality imaging, robustifying against patients with abnormalities, and variations in the acquisition model for instance as a result of different sampling strategies.
“Accelerating magnetic resonance imaging via deep learning,”
Proc. IEEE International Symposium on Biomedical Imaging, Melbourne, Australia, Apr. 2016.“Loss Functions for Image Restoration with Neural Networks,”
IEEE Transactions on Computational Imaging, vol. 3, no. 1, Mar. 2017.