CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE)

08/10/2018 ∙ by Chenyu You, et al. ∙ Sichuan University The University of Iowa Rensselaer Polytechnic Institute NetEase, Inc 0

Computed tomography (CT) is a popular medical imaging modality for screening, diagnosis, and image-guided therapy. However, CT has its limitations, especially involved ionizing radiation dose. Practically, it is highly desirable to have ultrahigh quality CT imaging for fine structural details at a minimized radiation dosage. In this paper, we propose a semi-supervised deep learning approach to recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Especially, with the generative adversarial network (GAN) as the basic component, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised HR outputs. In this deep imaging process, we incorporate deep convolutional neural network (CNNs), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the CT imaging performance, which limit its real-world applications by imposing considerable computational and memory overheads, we apply a parallel 1x1 CNN to reduce the dimensionality of the output of the hidden layer. Furthermore, we optimize the number of layers and the number of filters for each CNN layer. Quantitative and qualitative evaluations demonstrate that our proposed model is accurate, efficient and robust for SR image restoration from noisy LR input images. In particular, we validate our composite SR networks on two large-scale CT datasets, and obtain very encouraging results as compared to the other state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

X-ray computed tomography (CT) is one of the most popular medical imaging methods for screening, diagnosis, and image-guided intervention [1]. Potentially, high-resolution (HR) CT (HRCT) imaging may enhance the fidelity of radiomic features as well. Therefore, super-resolution (SR) methods in the CT field are receiving a major attention [2, 3]. The image resolution of a CT imaging system is constrained by x-ray focal spot size, detector element pitch, reconstruction algorithms, and other factors. While physiological and pathological units in the human body are on an order of 10 microns, the in-plane and through-plane resolution of clinical CT systems are on an order of submillimeter or 1  [3, 4]. Even though the modern CT imaging and visualization software can generate any small voxels, the intrinsic resolution is still far lower than what is ideal in important applications such as early tumor characterization and coronary artery analysis [5]. Consequently, how to produce HRCT images at a minimum radiation dose level is a holy grail of the CT field.

Fig. 1: Proposed GAN framework for SR CT imaging. Our approach uses two generators and , and the corresponding adversarial discriminators and respectively, where and denote a LR CT image and is the HR CT counterpart. To regularize the training and deblurring processes, we utilize the generator-adversarial loss (), cycle-consistency loss (), identity loss (), and joint sparsifying transform loss () synergistically. In the supervised/semi-supervised mode, we also apply a supervision loss () on and . For brevity, we denote and as and respectively.

In general, there are two strategies for improving CT image resolution: (1) hardware-oriented and (2) computational. First, more sophisticated hardware components can be used, including an x-ray tube with a fine focal spot size, detector elements of small pitch, and better mechanical precision for CT scanning. These hardware-oriented methods are generally expensive, increase the CT system cost and radiation dose, and compromise the imaging speed. Especially, it is well known that high X radiation dosage in a patient could induce genetic damages and cancerous diseases [6, 7]. As a result, the second type of methods for resolution improvement [8, 9, 10, 11, 12, 13, 14] is more attractive, which is to obtain HRCT images from LRCT images. This computational deblurring job is a major challenge, representing a seriously ill-posed inverse problem [3, 15]. Our neural network approach proposed in this paper is computational, utilizing advanced network architectures. More details are as follows.

To reconstruct HRCT images, various algorithms were proposed. These algorithms can be broadly categorized into the following classes: (1) Model-based reconstruction methods [16, 17, 18, 19, 20]: These techniques explicitly model the image degradation process and regularize the reconstruction according to the characteristics of projection data. These algorithms promise an optimal image quality under the assumption that model-based priors can be effectively imposed; and (2) Learning-based (before deep learning) SR methods [21, 22, 23, 24, 25]: These methods learn a nonlinear mapping from a training dataset consisting of paired LR and HR images to recover missing high-frequency details. Especially, sparse representation-based approaches have attracted an increasing interest since it exhibits strong robustness in preserving image features, suppressing noise and artifacts. Dong et al. [25] applied adaptive sparse domain selection and adaptive regularization to obtain excellent SR results in terms of both visual perceptions and PSNR. Zhang et al. [24] proposed a patch-based technique for SR enhancements of 4D-CT images. These results demonstrate that learning-based SR methods can greatly enhance overall image quality but the outcomes may still lose image subtleties and yield blocky appearance.

Recently, deep learning (DL) has been instrumental for computer vision tasks 

[26, 27, 28]. Hierarchical features and representations derived from a convolutional neural network (CNN) are leveraged to enhance discriminative capacity of visual quality, thus people have started developing SR models for natural images [29, 30, 31, 32, 33, 34]. The key to the success of DL-based methods is its independence from explicit imaging models and backup by big domain-specific data. The image quality is optimized by learning features in an end-to-end manner. More importantly, once a CNN-based SR model is trained, achieving SR is a purely feed-forward propagation, which demands a very low computational overhead.

In the medical imaging field, DL is an emerging approach which has exhibited a great potential [35, 36, 37]. For several imaging modalities, DL-based SR methods were successfully developed [38, 39, 40, 41]. Chen et al. [38] proposed a deep densely connected super-resolution network to reconstruct HR brain magnetic resonance (MR) images. Chaudhari et al. [41] developed a CNN-based network termed DeepResolve to learn a residual transformation from LR images to the corresponding HR images. More recently, Yu et al. [39] proposed two advanced CNN-based models with a skip connection to promote high-frequency textures which are then fused with up-sampled images to produce SR images.

Fig. 2:

Architecture of the SR generators. The generator is composed of feature extraction and reconstruction networks. The default stride is

, except for the feature blocks in which the stride for the conv layers is

. Up-scaling is performed to embed the residual layer for supervised training, and no interpolation method is used in the network for unsupervised feature learning.

Fig. 3: A architecture of the discriminators. stands for the number of convolutional kernels, and stands for stride. i.e., means the convolutional layer of 32 kernels with stride 1.
Feature extraction network Reconstruction network
1 2 3 4 5 6 7 8 9 10 11 12 A1 B1 B2 C1 C2 Output
/ 64 54 48 43 39 35 31 28 25 22 18 16 24 8 8 32 16 1
TABLE I: Number of filters on each convolution (conv) layer of the generative network.

Very recently, adversarial learning [42, 43] has become increasingly popular, which enables the CNN to learn feature representations from complex data distributions, with unprecedented successes. Adversarial learning is performed based on a generative adversarial network (GAN), defined as a mini-max game in which two competing players are a generator and a discriminator . In the game, is trained to learn a mapping from source images in a source domain to target images in the target domain . On the other hand, distinguishes the generated images and the target images with a binary label. Once well trained, GAN is able to model a high-dimensional distribution of target images. Wolterink et al. [44] proposed a unsupervised conditional GAN to optimize the nonlinear mapping from LR images to HR images, successfully enhancing the overall image quality.

However, there are still several major limitations in the DL-based SR imaging. First, existing supervised DL-based algorithms cannot address blind SR tasks without LR-HR pairs. In clinical practice, the limited number of LR and HR CT image pairs makes the supervised learning methods impractical since it is infeasible to ask patients to take multiple CT scans with additional radiation doses for paired CT images. Thus, it is essential to resort to semi-supervised learning. Second, utilizing the adversarial strategy can push the generator to learn an inter-domain mapping and produce compelling target images  

[45] but there is a potential risk that the network may yield features that are not exhibited in target images due to the degeneracy of the mapping. Since the optimal is capable of translating to distributed identically to , the GAN network cannot ensure that the noisy input and predicted output are paired in a meaningful way - there exist many mappings that may yield the same distribution over . Consequently, the mapping is highly under-constrained. Furthermore, it is undesirable to optimize the adversarial objective in isolation: the model collapse problem may occur to map all inputs to the same output image  [43, 46, 47]. To address this issue, Cycle-consistent GANs (cycleGAN) was designed to improve the performance of generic GAN, and utilized for SR imaging [34]. Third, other limitations of GANs were also pointed out in [48, 49]. How to steer a GAN learning process is not easy since may collapse into a narrow distribution which cannot represent diverse samples from a real data distribution. Also, there is no interpretable metric for training progress. Fourth, as the number of layers increases, deep neural networks can derive a hierarchy of increasingly more complex and more abstract features. Frequently, to improve the SR imaging capability of a network, complex networks are often tried with hundreds of millions of parameters. However, given the associated computational overheads, they are hard to use in real-world applications. Fifth, local feature parts in the CT image have different scales. This feature hierarchy can provide more information to reconstruct images, but most DL-based methods [31, 32] neglect to use hierarchical features. Finally, the distance between and is commonly used for the loss function to guide the training process of the network. However, the output optimized by the norm may suffer from over-smoothing as discussed in [50, 51], since the distance means to maximizing the peak signal-to-noise rate (PSNR) [30].

Motivated by the aforementioned drawbacks, in this study we made major efforts in the following aspects. First, we present a novel residual CNN-based network in the CycleGAN framework to preserve high-resolution anatomical details with no task specific regularization. Specially, we utilize the cycle-consistency constraint to enforce a strong across-domain consistency between and Second, to address the training problem of GANs [43, 49], we use the Wasserstein distance or “Earth Moving” distance (EM distance) instead of the Jensen-Shannon (JS) divergence. Third, inspired by the recent work [52], we optimize the network according to several fundamental designing principles to alleviate the computational overheads [53, 54, 55], which also helps prevent the network from over-fitting. Fourth, we cascade multiple layers to learn highly interpretable and disentangled hierarchical features. Moreover, we enable the information flow across the skip-connected layers to prevent gradient vanishing [53]Finally, we employ the  norm instead of  norm to refine deblurring, and we propose to use a jointly constrained total variation-based regularization as well, which leverages the prior information to reduce the noise with a minimal loss in spatial resolution or anatomical information. Extensive experiments with three real datasets demonstrate that our proposed composite network can achieve an excellent CT SR imaging performance comparable to or better than that of the state-of-the-art methods [31, 30, 33, 32, 21].

Ii Methods

Let us first review the SR problems in the medical imaging field. Then, we introduce the proposed adversarial nets framework and also present our SR imaging network architecture. Finally, we describe the optimization process.

Ii-a Problem Statement

Let be an input LR image and a matrix an output HR image, the conventional formulation of the ill-posed linear SR problem [21] can be formulated as

(1)

where S H denote the down-sampling and blurring system matrix, and the noise and other factors. Note that in practice, both the system matrix and not-modeled factors can be non-linear, instead of being linear (i.e., neither scalable nor additive).

Our goal is to computationally improve noisy LRCT images obtained under a low-dose CT (LDCT) protocol to HRCT images. The main challenges in recovering HRCT images can be listed as follows. First, LRCT images contain different or more complex spatial variations, correlations and statistical properties than natural images, which limit the SR imaging performance of the traditional methods. Second, the noise in raw projection data is introduced to the image domain during the reconstruction process, resulting in unique noise and artifact patterns. This creates difficulties for algorithms to produce the perfect image quality. Finally, since the sampling and degradation operations are coupled and ill-posed, SR tasks cannot be performed beyond a marginal degree using the traditional methods, which cannot effectively restore some fine features and suffer from the risk of producing blurry appearance and new artifacts. To address these limitations, here we develop an advanced neural network by composing a number of non-linear SR functional blocks for SR CT (SRCT) imaging along with the residual module to learn high-frequency details. Then, we perform adversarial learning in a cyclic manner to generate perceptually and quantitatively superior SRCT images.

Ii-B Deep Cycle-Consistent Adversarial SRCT Model

Ii-B1 Cycle-Consistent Adversarial Model

Current DL-based algorithms use feed-forward CNNs to learn non-linear mappings parametrized by , which can be written as:

(2)

In order to obtain a decent , a suitable loss function must be specified to encourage to generate a SR image based on the training samples so that

(3)

where are paired LRCT and HRCT images for training. To address the limitations mentioned in II-A, our Cyclic SRCT model is shown in Fig. 1. The proposed model includes two generative mappings and given training samples and . Note that we denote the two mappings and as and respectively for brevity. The two mappings and are jointly trained to produce synthesized images in a way that confuse the adversarial discriminators and respectively, which intend to identify whether the output of each generative mapping is real or artificial. i.e., given an LRCT image , attempts to generate a synthesized image highly similar to a real image so as to fool . In a similar way, attempts to discriminate between a reconstructed from and a real . The key idea is that the generators and discriminators are jointly/alternatively trained to improve their performance metrics synergistically. Thus, we have the following optimization problem:

(4)

To enforce the mappings between the source and target domains and regularize the training procedure, our proposed network combines four types of loss functions: adversarial loss (adv); cycle-consistency loss (cyc); identity loss (idt); joint sparsifying transform loss (jst).

Ii-B2 Adversarial Loss

For marginal matching [42], we employ adversarial losses to urge the generated images to obey the empirical distributions in the source and target domains. To improve the training quality, we apply the Wasserstein distance [56] instead of the negative log-likelihood used in [42]. Thus, we have the adversarial objective with respect to :

(5)

where

denotes the expectation operator; the first two terms are in terms of the Wasserstein estimation, and the third term penalizes the deviation of the gradient norm of its input relative to one,

is uniformly sampled along straight lines for pairs of and , and is a regularization parameter. A similar adversarial loss is defined for marginal matching in the reverse direction.

Ii-B3 Cycle Consistency Loss

Adversarial training is for marginal matching [42, 43]. However, in these earlier studies [46, 57]

, it was found that using adversarial losses alone cannot ensure the learned function can transform a source input successfully to a target output. To promote the consistency between

and , the cycle-consistency loss can be express as:

(6)

where denotes the norm. Since the cycle consistency loss encourages and , they are referred to as forward cycle consistency and backward cycle consistency respectively. The domain adaptation mapping refers to the cycle-reconstruction mapping. In effect, it imposes shared-latent space constraints to encourage the source content to be preserved during the cycle-reconstruction mapping. In other words, the cycle consistency enforces latent codes deviating from the prior distribution in the cycle-reconstruction mapping. Additionally, the cycle consistency can help prevent the degeneracy in adversarial learning [58].

Ii-B4 Identity Loss

Since a HR image should be a refined version of the LR counterpart, it is necessary to use the identity loss to regularize the training procedure [46, 47]. Compared with the loss, the loss does not over-penalize large differences or tolerate small errors between estimated and target images. Thus, the loss is preferred to alleviate the limitations of the loss in this context. Additionally, the loss enjoys the same fast convergence speed as that of the loss. The loss is formulated as follows:

(7)

We follow the same training baseline as in [47]; i.e., in the bi-directional mapping, the size of (or ) is the same as that of (or ).

Ii-B5 Joint Sparsifying Transform Loss

The total variation (TV) has demonstrated the state-of-the-art performance in promoting image sparsity and reducing noise in piecewise-constant images [59, 60, 61, 62, 63, 64, 65]. To express image sparsity, we formulate a nonlinear TV-based loss with the joint constraints as follows:

(8)

where is a scaling factor. Intuitively, the above constrained minimization combines two components: the first component is used for sparsifying reconstructed images and alleviating conspicuous artifacts, and the second helps preserve anatomical characteristics by minimizing the difference image . Essentially, these two components require a joint minimization under the bidirectional constraints. In this paper, the control parameter was set to . In the case of , the is regarded as the conventional loss.

Ii-B6 Overall Objective Function

In the training process, our proposed network is fine-tuned in an end-to-end manner to minimize the following overall objective function:

(9)

where , , and are parameters for balancing among different penalties. We call this modified cycleGAN as the GAN-CIRCLE as summarized in the title of this paper.

Ii-B7 Supervised learning with GAN-CIRCLE

In the case where we have access to paired dataset, we can render SRCT problems to train our model in a supervised fashion. Given the training paired data from the true joint, i.e., , we can define a supervision loss as follows:

(10)
(a) Ground-truth HR
(b) Noisy LR
(c) NN
(d) Bilinear
(e) Bicubic
(f) Lanczos
(g) A
(h) FSRCNN
(i) ESPCN
(j) LapSRN
(k) SRGAN
(l) G-Fwd
(m) G-Adv
(n) GAN-CIRCLE
(o) GAN-CIRCLE
(p) GAN-CIRCLE
Fig. 20: Visual comparsion of SRCT Case  from the Tibia dataset. The restored bony structures are shown in the red and yellow boxes in Fig. 53. The display window is [-900, 2000] HU.
(a) Original HR (b) Noisy LR (c) NN (d) Bilinear (e) Bicubic (f) Lanczos (g) A (h) FSRCNN (i) ESPCN (j) LapSRN (k) SRGAN (l) G-Fwd (m) G-Adv (n) GAN-CIRCLE (o) GAN-CIRCLE (p) GAN-CIRCLE
(q) Original HR (r) Noisy LR (s) NN (t) Bilinear (u) Bicubic (v) Lanczos (w) A (x) FSRCNN (y) ESPCN (z) LapSRN (aa) SRGAN (ab) G-Fwd (ac) G-Adv (ad) GAN-CIRCLE (ae) GAN-CIRCLE (af) GAN-CIRCLE
Fig. 53: Zoomed regions of interest (ROIs) marked by the red rectangle in Fig. 20. The restored image with GAN-CIRCLE reveals subtle structures better than the other variations of the proposed neural network, especially in the marked regions. The display window is [-900, 2000] HU.

Ii-C Network Architecture

Ii-C1 Generative Networks

Although more layers and larger model size usually result in the performance gain, for real application we designed a lightweight model to validate the effectiveness of GAN-CIRCLE. The two generative networks and are shown in Fig. 2. The network architecture has been optimized for SR CT imaging. It consists of two processing steams: the feature extraction network and the reconstruction network.

In the feature extraction network, we concatenate sets of non-linear SR feature blocks composed of

Convolution (Conv) kernels, bias, Leaky ReLU, and a dropout layer. We utilize Leaky ReLU to prevent the ‘dead ReLU’ problem thanks to the nature of leaky rectified linear units (Leaky ReLU):

. Applying the dropout layer is to prevent overfitting. The number of filters are shown in Table I. In practice, we avoid normalization which is not suitable for SR, because we observe that it discards the range flexibility of the features. Then, to capture both local and the global image features, all outputs of the hidden layers are concatenated before the reconstruction network through skip connection. The skip connection helps prevent training saturation and overfitting. Diverse features which represent different details of the HRCT components can be constructed in the end of feature extraction network.

Tibia Case Abdominal Case Real Case  Real Case 
PSNR SSIM IFC PSNR SSIM IFC PSNR SSIM IFC PSNR SSIM IFC
NN 24.754 0.645 2.785 26.566 0.592 1.919 28.072 0.798 0.246 27.903 0.798 0.234
Bilinear 24.667 0.612 2.588 27.726 0.605 1.933 28.162 0.812 0.255 27.937 0.799 0.235
Bicubic 25.641 0.662 2.835 28.599 0.619 2.008 28.117 0.805 0.250 27.929 0.798 0.235
Lanczos 25.686 0.663 2.848 28.644 0.619 2.01 28.116 0.806 0.251 27.377 0.800 0.235
A 26.496 0.696 3.028 28.154 0.589 1.899 27.877 0.804 0.249 27.037 0.778 0.236
FSRCNN 28.360 0.923 3.533 30.950 0.924 2.285 35.384 0.830 0.265 33.643 0.805 0.237
ESPCN 28.361 0.921 3.520 30.507 0.923 2.252 35.378 0.830 0.278 33.689 0.805 0.245
LapSRN 28.486 0.923 3.533 30.985 0.925 2.299 35.372 0.830 0.277 33.711 0.805 0.244
SRGAN 21.924 0.389 1.620 28.550 0.871 1.925 33.002 0.737 0.232 31.775 0.701 0.220
G-Fwd 28.649 0.931 3.618 31.282 0.925 2.348 35.227 0.829 0.276 33.589 0.803 0.236
G-Adv 26.945 0.676 2.999 26.930 0.889 1.765 32.518 0.725 0.199 31.712 0.700 0.210
GAN-CIRCLE 27.742 0.895 3.944 30.720 0.924 2.435 - - - - - -
GAN-CIRCLE 27.071 0.887 3.893 29.988 0.902 2.367 33.194 0.829 0.285 31.252 0.804 0.245
GAN-CIRCLE 27.255 0.891 2.713 28.439 0.894 2.019 32.138 0.824 0.283 30.641 0.796 0.232
TABLE II: Quantitative evaluation of state-of-the-art SR algorithms. Red and blue indicate the best and the second best performance, respectively.

In the reconstruction network, we stack two reconstruction branches and integrate the information flows. Because all the outputs from the feature extraction network are densely connected, we propose a parallelized CNNs (Network in Network) [66]

which utilize shallow multilayer perceptron (MLP) to perform a nonlinear projection in the spatial domain. There are several benefits with the Network in Network strategy. First, the

Conv layer can significantly reduce the dimensionality of the filter space for faster computation with less information loss [66]. Second, the Conv layer can increase non-linearity of the network to learn a complex mapping better at the finer levels. For up-sampling, we adopt the transposed convolutional (up-sampling) layers [67] by a scale of . The last Conv layer fuses all the feature maps, resulting in an entire residual image containing mostly high-frequency details. In the supervised setting, the up-sampled image by the bicubic interpolation layer is combined (via element-wise addition) with the residual image to produce a HR output. In the unsupervised and semi-supervised settings, no interpolation is involved across the skip connection.

It should be noted that the generator shares the same architecture as in both the supervised and unsupervised scenarios. The default stride size is . However, for unsupervised feature learning, the stride of the Conv layers is in the feature blocks. Also, for supervised feature learning, the stride of the Conv layers is in the and feature blocks of . We refer to the forward generative network as G-Forward.

(a) Original HR (b) Noisy LR (c) GAN-CIRCLE (d) GAN-CIRCLE
(e) Original HR (f) Noisy LR (g) NN (h) Bilinear (i) Bicubic (j) Lanczos (k) A (l) FSRCNN (m) ESPCN (n) LapSRN (o) SRGAN (p) G-Fwd (q) G-Adv (r) GAN-CIRCLE (s) GAN-CIRCLE (t) GAN-CIRCLE
(u) Original HR (v) Noisy LR (w) NN (x) Bilinear (y) Bicubic (z) Lanczos (aa) A (ab) FSRCNN (ac) ESPCN (ad) LapSRN (ae) SRGAN (af) G-Fwd (ag) G-Adv (ah) GAN-CIRCLE (ai) GAN-CIRCLE (aj) GAN-CIRCLE
Fig. 90: Visual comparison of SRCT Case  from the abdominal dataset. The display window is [-160, 240] HU. The restored anatomical features are shown in the red and yellow boxes. (Zoomed for visual clarity).

Ii-C2 Discriminative Networks

As shown in Fig. 3, in reference to the recent successes with GANs [68, 30], is designed to have stages of Conv, bias, instance norm [69] (IN) and Leaky ReLU, followed by two fully-connected layers, of which the first has units and the other has a single output. In addition, inspired by [49] no sigmoid cross entropy layer is applied by the end of . We apply filter size for the Conv layers which had different numbers of filters, which are respectively.

(a) Original HR (b) Noisy LR (c) GAN-CIRCLE (d) GAN-CIRCLE
(e) Noisy LR (f) NN (g) Bilinear (h) Bicubic (i) Lanczos (j) A (k) FSRCNN (l) ESPCN (m) LapSRN (n) SRGAN (o) G-Fwd (p) G-Adv (q) GAN-CIRCLE (r) GAN-CIRCLE
(s) Noisy LR (t) NN (u) Bilinear (v) Bicubic (w) Lanczos (x) A (y) FSRCNN (z) ESPCN (aa) LapSRN (ab) SRGAN (ac) G-Fwd (ad) G-Adv (ae) GAN-CIRCLE (af) GAN-CIRCLE
Fig. 123: Visual comparison of SRCT Case  from the real dataset. The display window is [180, 4096] HU. The restored anatomical features are shown in the red and yellow boxes. (Zoomed for visual clarity).

Iii Experiments and results

We discuss our experiments in this section. We first introduce the datasets we utilize and then describe the implementation details and parameter settings in our proposed methods. We also compare our proposed algorithms with the state-of-the-art SR methods [33, 32, 30, 31] quantitatively and qualitatively. We further evaluate our results in reference to the state-of-the-art, and demonstrate the robustness of our methods in the real SR scenarios. Finally, we present the detailed diagnostic quality assessments from expert radiologists. Note that we use the default parameters of all the evaluated methods.

Iii-a Training Datasets

In this study, we used two high-quality sets of training images to demonstrate the fidelity and robustness of the proposed GAN-CIRCLE. As shown in Figs. 20 - 90, these two datasets are of very different characteristics.

Iii-A1 Tibia dataset

This micro-CT image dataset reflects twenty-five fresh-frozen cadaveric ankle specimens which were removed at mid-tibia from 17 body donors (mean age at death  SD: Y; female). After the soft tissue were removed and the tibia was dislocated from the ankle joint, each specimen was scanned on a Siemens microCAT II (Preclinical Solutions, Knoxville, TN, USA) in the cone beam imaging geometry. The micro-CT parameters are briefly summarized as follows: a tube voltage kV, a tube current mAs, projections over a range of degrees, an exposure time of sec per projection, and the filter backprojection (FBP) method was utilized to produce isotropic voxels. Since CT images are not isotropic in each direction, for convenience of our previous analysis [70], we convert micro-CT images to using a windowed sync interpolation method. In this study, the micro-CT images we utilized as HR images were prepared at voxel size, as the target for SR imaging based of the corresponding LR images at voxel size. The full description is in [70]. We target X resolution improvement.

(a) Original HR (b) Noisy LR (c) GAN-CIRCLE (d) GAN-CIRCLE
(e) Noisy LR (f) NN (g) Bilinear (h) Bicubic (i) Lanczos (j) A (k) FSRCNN (l) ESPCN (m) LapSRN (n) SRGAN (o) G-Fwd (p) G-Adv (q) GAN-CIRCLE (r) GAN-CIRCLE
(s) Noisy LR (t) NN (u) Bilinear (v) Bicubic (w) Lanczos (x) A (y) FSRCNN (z) ESPCN (aa) LapSRN (ab) SRGAN (ac) G-Fwd (ad) G-Adv (ae) GAN-CIRCLE (af) GAN-CIRCLE
Fig. 156: Visual comparison of SRCT Case  from the real dataset. The display window is [180, 4096] HU. The restored bony structures are shown in the red and yellow boxes. (Zoomed for visual clarity).

Iii-A2 Abdominal dataset

This clinical dataset is authorized by Mayo Clinic for 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge. The dataset contains full dose CT images from 10 patients with the reconstruction interval and slice thickness of and respectively. The original CT images were generated by multidetector row CT (MDCT) with image size of . The projection data is from views per scan. The HR images, with voxel size , were reconstructed using the FBP method from all projection views. More detailed information of the dataset are given in [71].

We perform image pre-processing for all CT images through the following workflow. The original CT images were first scaled from the CT Hounsfield Value (HU) to the unit interval [0,1], and treated as the ground-truth HRCT images. In addition, we followed the convention in [23, 72] to generate LR images by adding noise to the original images and then lowering the spatial resolution by a factor of . For convenience in training our proposed network, we up-sampled the LR image via proximal interpolation to ensure that and are of the same size.

Since the amount of training data plays a significant role in training neural networks [73], we extracted overlapping patches from LRCT and HRCT images instead of directly feeding the entire CT images to the training pipeline. The overlapped patches were obtained with a predefined sliding size. This strategy preserves local anatomical details, and boost the number of samples. We randomly cropped HRCT images into patches of , along with their corresponding LRCT patches of size

at the same center point for supervised learning. With the unsupervised learning methods, the size of the HRCT and LRCT patches are

in batches of size .

Iii-B Implementation Details

In the proposed GAN-CIRCLE, we initialized the weights of the Conv layer based on [74]. We computed std in the manner of where std

is the standard deviation,

, the filter size, and the number of filters. i.e., given and std = and all bias were initialized to . In the training process, we empirically set , , to , , . Dropout regularization [75] with was applied to each Conv layer. All the Conv and transposed Conv layers were followed by Leaky ReLu with a slope

. To make the size of all feature maps the same as that of the input, we padded zeros around the boundaries before the convolution. We utilized the Adam optimizer 

[76] with to minimize the loss function of the proposed network. We set the learning rate to for all layers and then decreased by a factor of for every epochs and terminated the training after

epochs. All experiments were conducted using the TensorFlow library on a NVIDA TITAN XP GPU.

Iii-C Performance Comparison

In this study, we compared the proposed GAN-CIRCLE with the state-of-the-art methods: Nearest-neighbor (NN), Bilinear, Bicubic, Lanczos, adjusted anchored neighborhood regression A [timofte2014aplus], FSRCNN [31], ESPCN [33], LapSRN [32], and SRGAN [30]. For clarity, we categorized the methods into the following classes: the interpolation-based, dictionary-based, PSNR-oriented, and GAN-based methods. Especially, we trained the publicly available FSRCNN, ESPCN, LapSRN and SRGAN with our paired LR and HR images. To demonstrate the effectiveness of the DL-based methods, we first denoised the input LR images and then super-resolved the denoised CT image using the typical interpolation methods: nearest neighbor (NN) up-sampling, bilinear interpolation, bicubic interpolation, lanczos interpolation. BM3D[77] is one of the classic image domain denoising algorithms, which is efficient and powerful. Thus, we preprocessed the noisy LRCT images with BMD3, and then super-solved the denoised images by interpolation methods and A. We refer to interpolation-based methods as NN, Bilinear, Bicubic, Lanczos.

We evaluated three variations of the proposed method: (1) G-Forward (G-Fwd), which is the forward generator of GAN-CIRCLE, (2) G-Adversarial (G-Adv), which uses the adversarial learning strategy, and (3) the full-fledged GAN-CIRCLE. To emphasize the effectiveness of the GAN-CIRCLE structure, we first trained the three models using the supervised learning strategy, and then trained our proposed GAN-CIRCLE in the semi-supervised scenario (GAN-CIRCLE), and finally implement GAN-CIRCLE in the unsupervised manner (GAN-CIRCLE). In the semi-supervised settings, two datasets were created separately by randomly splitting the dataset into paired and unpaired dataset with respect to three variants: , , and paired. To better evaluate the performance of each methods, we use the same size of the dataset for training and testing.

We validated the SR performance in terms of three widely-used image quality metrics: Peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM) [78], and Information Fidelity Criterion (IFC) [79]. Through extensive experiments, we compared all the above-mentioned methods on the two benchmark datasets described in Section III-A.

Iii-D Experimental Results with the Tibia Dataset

We evaluated the proposed algorithms against the state-of-the-art algorithms on the tibia dataset. We present typical results in Fig. 20. It is observed that BM3D can effectively remove the noise, but it over-smoothens the noisy LR images. Then, the interpolation-based methods (NN, Bilinear, Bicubic, Lanczos) yield noticeable artifacts caused by partial aliasing. On the other hand, the DL-based methods suppress such artifacts effectively. It can be seen that our proposed GAN-CIRCLE recovers more fine subtle details and captures more anatomical information in Fig. 53. It is worth mentioning that Fig. 20 shows that there are severe distortions of the original images but SRGAN generates compelling results in Figs. 90-156, which indicate VGG network is a task-specific network which can generate images with excellent image quality. We argue that the possible reason is that the VGG network [68] is a pre-trained CNN-based network based on natural images with structural characteristic correlated with the content of medical images [80]. Fig. 53 presents that the proposed GAN-CIRCLE can predict images with shaper boundaries and richer textures than GAN-CIRCLE, and GAN-CIRCLE which learns additional anatomical information from the unpaired samples. The quantitative results are in Table II

. The results demonstrate that the G-Forward achieves the highest scores using the evaluation metrics, PSNR and SSIM, which outperforms all other methods. However, it has been pointed out in 

[81, 82] that high PSNR and SSIM values cannot guarantee a visually favorable result. Non-GAN based methods (FSRCNN, ESPCN, LapSRN) may fail to recover some fine structure for diagnostic evaluation, such as shown by zoomed boxes in Fig. 53. Quantitatively, GAN-CIRCLE achieves the second best values in terms of SSIM and IFC. It has been pointed out in [83] that IFC value is correlated well with the human perception of SR images. Our GAN-CIRCLE obtained comparable results qualitatively and quantitatively. Table II shows that the proposed semi-supervised method performs similarly compared to the fully supervised methods on the tibia dataset. In general, our proposed GAN-CIRCLE can generate more pleasant results with sharper image contents.

Tibia Dataset Abdominal Dataset
 Image Sharpness  Image Noise Contrast Resolution  Diagnostic Acceptance  Overall Quality  Image Sharpness  Image Noise Contrast Resolution  Diagnostic Acceptance  Overall Quality
NN 1.890.27 2.430.21 1.890.78 1.520.69 1.720.33 1.980.46 3.221.45 1.470.23 1.670.85 2.340.42
Bilinear 1.870.41 2.520.73 2.010.83 1.650.73 1.950.47 2.020.41 3.121.58 1.850.96 1.750.83 2.430.45
Bicubic 2.550.43 2.340.82 2.190.91 1.820.21 2.120.23 2.520.53 2.871.05 2.510.53 2.270.45 2.610.67
Lanczos 2.250.39 2.500.85 2.360.82 1.930.43 2.230.29 2.530.59 2.990.86 2.550.64 2.210.35 2.680.31
A 2.340.47 2.540.68 2.520.67 1.980.59 2.370.97 2.740.75 3.070.96 2.610.69 2.350.57 2.740.71
FSRCNN 2.850.94 3.160.57 2.540.96 2.770.69 3.270.76 3.070.89 3.550.50 2.940.78 2.920.58 3.090.53
ESPCN 2.820.86 3.180.51 2.580.46 2.950.46 3.490.66 2.951.43 3.390.80 2.850.63 2.760.83 3.060.85
LapSRN 2.910.88 3.490.70 2.690.56 3.010.78 3.630.61 3.010.56 3.580.81 2.830.71 3.250.92 3.110.78
SRGAN 1.940.37 2.710.23 1.910.71 1.750.83 1.931.01 3.350.97 3.231.01 3.270.92 3.461.11 3.410.94
G-Fwd 2.990.42 3.590.57 3.070.91 3.451.02 3.700.71 3.250.94 3.530.70 2.950.57 3.380.93 3.090.55
G-Adv 2.890.86 3.131.02 3.020.58 3.290.69 3.620.67 3.451.12 3.340.81 3.310.86 3.480.77 3.320.82
GAN-CIRCLE 3.120.73 3.400.43 3.170.46 3.610.36 3.790.72 3.590.41 3.410.42 3.510.66 3.640.54 3.620.41
GAN-CIRCLE 3.020.78 3.140.68 3.120.88 3.470.67 3.710.76 3.480.81 3.290.80 3.420.78 3.570.68 3.510.46
GAN-CIRCLE 2.910.82 3.320.89 3.080.94 3.320.48 3.570.52 3.460.73 3.391.04 3.390.50 3.540.53 3.341.01
TABLE III: Diagnostic quality assessment in terms of subjective quality scores for different algorithms (meanstds).  Red and blue indicate the best and the second best performance, respectively.

Iii-E Experimental Results on the Abdominal Dataset

We further compared the above-mentioned algorithms on the abdominal benchmark dataset. A similar trend can be observed on this dataset. Our proposed GAN-CIRCLE can preserve better anatomical informations and more clearly visualize the portal vein as shown in Fig. 90. These results demonstrate that PSNR-oriented methods (FSRCNN, ESPCN, LapSRN) can significantly suppress the noise and artifacts. However, it suffers from low image quality as judged by the human observer since it assumes that the impact of noise is independent of local image features, while the sensitivity of the Human Visual System (HVS) to noise depends on local contrast, intensity and structural variations. Fig. 90 displays the LRCT images processed by GAN-based methods (SRGAN, G-Adv, GAN-CIRCLE, GAN-CIRCLE, and GAN-CIRCLE) with improved structural identification. It can also observed that the GAN-based models also introduce strong noise into results. For example, there exist tiny artifacts on the results of GAN-CIRCLE. As the SR results shown in Fig. 90, our proposed approaches (GAN-CIRCLE, GAN-CIRCLE) are capable of retaining high-frequency details to reconstruct more realistic images with relatively lower noise compared with the other GAN-based methods (G-Adv, SRGAN). Table II show that G-Fwd achieves the best performance in PSNR. Our proposed methods GAN-CIRCLE and GAN-CIRCLE both obtain the pleasing results in terms of SSIM and IFC. In other words, the results show that the proposed GAN-CIRCLE and GAN-CIRCLE generate more visually pleasant results with sharper edges on the abdominal dataset than the competing state-of-the-art methods.

Iii-F Super-resolving Real-world Images

We analyzed the performance of the SR methods in the simulated SRCT scenarios in Sections III-D and III-E. These experimental results show that the DL-based methods are very effective in addressing the ill-posed SRCT problems with two significant features. First, SRCT aims at recovering a HRCT image from a LRCT images under a low-dose protocol. Second, most DL-based methods assume the paired LRCT images and HRCT images are matched, an assumption which is likely to be violated in clinical practice. In other words, the above-evaluated datasets were simulated, and thus the fully supervised algorithms can easily cope with SRCT tasks, with exactly matched training samples. Our further goal is to derive the semi-supervised scheme to handle unmatched/unpaired data with a relative lack of matched/paired data to address real SRCT tasks. In this subsection, we demonstrate a strong capability of the proposed methods in the real applications using a small amount of mismatched paired LRCT and HRCT images as well as a high flexibility of adapting to various noise distributions.

Iii-F1 Practical SRCT Implementation Details

We first obtained LRCT and HRCT images using a deceased mouse on the same scanner with two scanning protocols. The micro-CT parameters are as follows: X-ray source circular scanning, kVp, mAs, projections over a range of degrees, exposure ms per projection, and the micro-CT images were reconstructed using a conventional filtered back projection algorithm (FDK): HRCT image size , slices at isotropic voxel size, and the LRCT image size , slices at isotropic voxel size. Then, we compared with the state-of-the-art super-resolution methods. Since the real data are unmatched, we accordingly evaluated our proposed GAN-CIRCLE and GAN-CIRCLE networks for X resolution improvement.

Iii-F2 Comparison With the State-of-the-Art methods

The quantitative results were summarized for all the involved methods in Table II. The PSNR-oriented approaches, such as FSRCNN, ESPCN, LapSRN, and our G-Fwd, yield higher PSNR and SSIM values than the GAN-based methods. It is not surprising that the PSNR-oriented methods obtained favorable PSNR values since their goal is to minimize per-pixel distance to the ground truth. However, our GAN-CIRCLE and GAN-CIRCLE achieved the highest IFC among all the SR methods. Our method GAN-CIRCLE obtained the second best results in term of SSIM. The visual comparisons are given in Figs. 123 and 156. To demonstrate the robustness of our methods, we examined anatomical features in the lung regions and the bone structures of the mice, as shown in Figs. 123 and 156 respectively. It is observed that the GAN-based approaches performed favorably over the PSNR-oriented methods in term of perceptual quality as illustrated in Figs. 123 and 156. Fig. 123 confirms that the PSNR-oriented methods produced blurry results especially in the lung regions, while the GAN-based methods restored anatomical contents satisfactorily. In Fig. 156, it is notable that our methods GAN-CIRCLE and GAN-CIRCLE performed better than the other methods in terms of recovering structural information and preserving edges. These SR results demonstrate that our proposed methods can provide better visualization of bone and lung microarchitecture with sharp edge and rich texture.

Iii-G Diagnostic Quality Assessment

We invited three board-certified radiologists with mean clinical CT experience of 12.3 years to perform independent qualitative image analysis on sets of images from two benchmark dataset (Tibia and Abdominal Dataset). Each set includes the same image slice but generated using different methods. We label HRCT and LRCT images in each set as reference. The sets of images from two datasets were randomized and deidentified so that the radiologists were blind to the post-proprocessing algorithms. Image sharpness, image noise, contrast resolution, diagnostic acceptability and overall image quality were graded on a scale from 1 (worst) to 5 (best). A score of 1 refers to a ‘non-diagnostic’ image, while a score of 5 means an ‘excellent’ diagnostic image quality. The mean scores with their standard deviation are presented in Table III. The radiologists confirmed that GAN-based methods (G-Adv, SRGAN, GAN-CIRCLE, GAN-CIRCLE and GAN-CIRCLE) provide sharper images with better texture details, while PSNR-oriented algorithms (FSRCNN, ESPCN, LapSRN, G-Fwd) give the higher noise suppression scores. Table III shows that our proposed GAN-CIRCLE and GAN-CIRCLE achieve comparable results, while outperforming the other methods in terms of image sharpness, contrast resolution, diagnostic acceptability and overall image quality.

Iv Discussions

SR imaging holds tremendous promise for practical medical applications; i.e., depicting bony details, lung structures, and implanted stents, and potentially enhancing radiomics analysis. As a results, X-ray computed tomography can provide compelling practical benefit in biological evaluation.

High resolution micro-CT is well-suited for bone imaging. Osteoporosis, characterized by reduced bone density and structural degeneration of bone, greatly diminishes bone strength and increases the risk of fracture [84]. Histologic studies have convincingly demonstrated that bone micro-structural properties are strong determinants of bone strength and fracture risk [85, 86, 87]. Modern whole-body CT technologies, benefitted with high spatial resolution, ultra-high speed scanning, relatively-low dose radiation, and large scan length, allows quantitative characterization of bone micro-structure [70]. However, the state-of-the-art CT imaging technologies only allow the spatial resolution comparable or slightly higher than human trabecular bone thickness ( [88]) leading to fuzzy representation of individual trabecular bone micro-structure with significant partial volume effects that adds significant errors in measurements and interpretations. The spatial resolution improvements in bone micro-structural representation will largely reduce such errors and improve generalizability of bone micro-structural measures from multi-vendor CT scanners by homogenizing spatial resolution.

Besides revealing micro-architecture, CT scans of the abdomen and pelvis are diagnostic imaging tests used to help detect diseases of the small bowel and colon, kidney stone, and other internal organs, and are often used to determine the cause of unexplained symptoms. With rising concerns over increased lifetime risk of cancer by radiation dose associated with CT, several studies have assessed manipulation of scanning parameters and newer technologic developments as well as adoption of advanced reconstruction techniques for radiation dose reduction [89, 90, 91, 92, 93]. However, in practice, the physical constraints of system hardware components and radiation dose considerations constrain the imaging performance, and computational means are necessary to optimize image resolution. For the same reason, high-quality/high-dose CT images are not often available, which means that there are often not enough paired data to train a hierarchical deep generative model.

Our results have suggested an interesting topic on how to utilize unpaired data so that the imaging performance could be improved. In this regard, the use of the adversarial learning as the regularization term for SR imaging is a new mechanism to capture anatomical information. However, it should be noted that the existing GAN-based methods introduce additional noise to the results, as seen in Section III-D and III-E. To cope with this limitation, we have incorporated the cycle-consistency so that the network can learn a complex deterministic mapping to improve image quality. The enforcement of identity and supervision allows the model to master more latent structural information to improve image resolution. Also, we have used the Wasserstein distance to stabilize the GAN training process. Moreover, typical prior studies used complex inference to learn a hierarchy of latent variables for HR imaging, which is hard to be utilized in medical applications. Thus, we have designed an efficient CNN-based network with skip-connection and network in network techniques. In the feature extraction network, we have optimized the network structures and reduced the computational complexity by applying a small amount of filters in each Conv layer and utilizing the ensemble learning model. Both local and global features are cascaded through skip connections before being fed into the restoration/reconstruction network.

Although our model has achieved compelling results, there still exist some limitations. First, the proposed GAN-CIRCLE requires much longer training time than other standard GAN-based methods, which generally requires 1-2 days. Future work in this aspect should consider more principled ways of designing more efficient architectures that allow for learning more complex structural features with less complex networks at less computational cost and lower model complexity. Second, although our proposed model can generate more plausible details and better anatomical details, all subtle structures may not be always faithfully recovered. It has been also observed that the recent literature [94]

mentions that the Wasserstein distance may yield the biased sample gradients, is subject to the risk of incorrect minimum, and not well suitable for stochastic gradient descent searching. In the future, experimenting with the variants of GANs are highly recommended. Finally, we notice that the network with the adversarial training can generate more realistic images. However, the restored images cannot be uniformly consistent to the original high-resolution images. To make further progress, we may also undertake efforts to add more constraints such as the sinogram consistence and the low-dimensional manifold constraint to decipher the relationship between noise, blurry appearances of images and the ground truth, and even develop an adaptive and/or task-specific loss function.

V Conclusions

In this paper, we have estabilished a cycle wasserstein regression adversarial training framework for CT SR imaging. Aided by unpaired data, our approach learns complex structured features more effectively with a limited amount of paired data. At a low computational cost, the proposed network G-Forward can achieve the significant SR gain. In general, the proposed GAN-CIRCLE has produced promising results in terms of preserving anatomical information and suppressing image noise in a purely supervised and semi-supervised learning fashion. Visual evaluations by the expert radiologists confirm that our proposed GAN-CIRCLE networks have brought superior diagnostic quality, which is consistent to systematic quantitative evaluations in terms of traditional image quality measures.

Acknowledgment

The authors would like to thank the NVIDIA Corporation for the donation of the TITAN XP GPU to Dr. Ge Wang’s laboratory, which was used for this study. The authors would like to thank Dr. Shouhua Luo (Southeast University, China) for providing small animal data collected on an in vivo micro-CT system.

References

  • [1] D. J. Brenner, C. D. Elliston, E. J. Hall, and W. E. Berdon, “Estimated risks of radiation-induced fatal cancer from pediatric CT,” Am. J. Roentgenol., vol. 176, no. 2, pp. 289–296, 2001.
  • [2] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Mag., vol. 20, no. 3, pp. 21–36, 2003.
  • [3] H. Greenspan, “Super-resolution in medical imaging,” Comput. J., vol. 52, no. 1, pp. 43–63, 2008.
  • [4] G. Schwarzband and N. Kiryati, “The point spread function of spiral CT,” Phys. Med. Biol., vol. 50, no. 22, p. 5307, 2005.
  • [5] A. Hassan, S. A. Nazir, and H. Alkadhi, “Technical challenges of coronary CT angiography: today and tomorrow,” Eur. J. Radiol., vol. 79, no. 2, pp. 161–171, 2011.
  • [6] D. J. Brenner and E. J. Hall, “Computed tomography —- an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
  • [7] A. B. De Gonzalez and S. Darby, “Risk of cancer from diagnostic X-rays: estimates for the UK and 14 other countries,” Lancet., vol. 363, no. 9406, pp. 345–351, Jan. 2004.
  • [8] P. J. La Rivière, J. Bian, and P. A. Vargas, “Penalized-likelihood sinogram restoration for computed tomography,” IEEE Trans. Med. Imag., vol. 25, no. 8, pp. 1022–1036, 2006.
  • [9] M. W. Vannier, “Iterative deblurring for CT metal artifact reduction,” IEEE Trans. Med. Imag., vol. 15, no. 5, p. 651, 1996.
  • [10] G. Wang, M. W. Vannier, M. W. Skinner, M. G. Cavalcanti, and G. W. Harding, “Spiral CT image deblurring for cochlear implantation,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 251–262, 1998.
  • [11] D. D. Robertson, J. Yuan, G. Wang, and M. W. Vannier, “Total hip prosthesis metal-artifact suppression using iterative deblurring reconstruction,” J. Comput. Assist. Tomogr., vol. 21, no. 2, pp. 293–298, 1997.
  • [12] M. Jiang, G. Wang, M. W. Skinner, J. T. Rubinstein, and M. W. Vannier, “Blind deblurring of spiral CT images,” IEEE Trans. Med. Imag., vol. 22, no. 7, pp. 837–845, 2003.
  • [13] M. Jiang, G. Wang, M. Skinner, J. Rubinstein, and M. Vannier, “Blind deblurring of spiral CT images—comparative studies on edge-to-noise ratios,” Med. Phys., vol. 29, no. 5, pp. 821–829, 2002.
  • [14] J. Wang, G. Wang, and M. Jiang, “Blind deblurring of spiral CT images based on enr and wiener filter,” J. X-Ray Sci. Technol., vol. 13, no. 1, pp. 49–60, 2005.
  • [15] J. Tian and K.-K. Ma, “A survey on super-resolution imaging,” Signal Image Video P., vol. 5, no. 3, pp. 329–342, 2011.
  • [16] R. Zhang, J.-B. Thibault, C. A. Bouman, K. D. Sauer, and J. Hsieh, “Model-based iterative reconstruction for dual-energy X-ray CT using a joint quadratic likelihood model,” IEEE Trans. Med. Imag., vol. 33, no. 1, pp. 117–134, 2014.
  • [17] C. A. Bouman and K. Sauer, “A unified approach to statistical tomography using coordinate descent optimization,” IEEE Trans. Image Process., vol. 5, no. 3, pp. 480–492, 1996.
  • [18] Z. Yu, J.-B. Thibault, C. A. Bouman, K. D. Sauer, and J. Hsieh, “Fast model-based X-ray CT reconstruction using spatially nonhomogeneous icd optimization,” IEEE Trans. Image Process., vol. 20, no. 1, pp. 161–175, 2011.
  • [19] K. Sauer and C. Bouman, “A local update strategy for iterative reconstruction from projections,” IEEE Trans. Signal Process., vol. 41, no. 2, pp. 534–548, 1993.
  • [20] J.-B. Thibault, K. D. Sauer, C. A. Bouman, and J. Hsieh, “A three-dimensional statistical approach to improved image quality for multislice helical CT,” Med. Phys., vol. 34, no. 11, pp. 4526–4544, 2007.
  • [21] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
  • [22] Z. Wang, Y. Yang, Z. Wang, S. Chang, J. Yang, and T. S. Huang, “Learning super-resolution jointly from external and internal examples,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4359–4371, 2015.
  • [23] C. Jiang, Q. Zhang, R. Fan, and Z. Hu, “Super-resolution CT image reconstruction based on dictionary learning and sparse representation,” Sci. Rep., vol. 8, no. 1, p. 8799, 2018.
  • [24] Y. Zhang, G. Wu, P.-T. Yap, Q. Feng, J. Lian, W. Chen, and D. Shen, “Reconstruction of super-resolution lung 4D-CT using patch-based sparse representation,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2012, pp. 925–931.
  • [25] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857, 2011.
  • [26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [27] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
  • [28]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
  • [29]

    S. Wang, M. Kim, G. Wu, and D. Shen, “Scalable high performance image registration framework by unsupervised deep feature representations learning,” in

    IEEE Trans. Biomed. Eng., 2017, pp. 245–269.
  • [30] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network.” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), vol. 2, no. 3, 2017, p. 4.
  • [31] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Eur. Conf. Comp. Vis. (ECCV), 2016, pp. 391–407.
  • [32] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2017.
  • [33] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2016, pp. 1874–1883.
  • [34] Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, “Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), June 2018.
  • [35]

    G. Wang, M. Kalra, and C. G. Orton, “Machine learning will transform radiology significantly within the next 5 years,”

    Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.
  • [36] G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
  • [37] G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1289–1296, 2018.
  • [38] Y. Chen, F. Shi, A. G. Christodoulou, Z. Zhou, Y. Xie, and D. Li, “Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network,” CoRR, vol. abs/1803.01417, 2018.
  • [39] H. Yu, D. Liu, H. Shi, H. Yu, Z. Wang, X. Wang, B. Cross, M. Bramler, and T. S. Huang, “Computed tomography super-resolution using convolutional neural networks,” in Proc. IEEE Intl. Conf. Image Process., 2017, pp. 3944–3948.
  • [40] J. Park, D. Hwang, K. Y. Kim, S. K. Kang, Y. K. Kim, and J. S. Lee, “Computed tomography super-resolution using deep convolutional neural network,” Phys. Med. Biol., 2018.
  • [41] A. S. Chaudhari, Z. Fang, F. Kogan, J. Wood, K. J. Stevens, E. K. Gibbons, J. H. Lee, G. E. Gold, and B. A. Hargreaves, “Super-resolution musculoskeletal MRI using deep learning,” Magn. Reson. Med., 2018.
  • [42] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
  • [43] I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” CoRR, vol. abs/1701.00160, 2016.
  • [44] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Generative adversarial networks for noise reduction in low-dose CT,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2536–2545, 2017.
  • [45] D. Nie, R. Trullo, J. Lian, L. Wang, C. Petitjean, S. Ruan, Q. Wang, and D. Shen, “Medical image synthesis with deep convolutional adversarial networks,” IEEE Trans. Biomed. Eng., 2018.
  • [46]

    J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” in

    Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2017.
  • [47] E. Kang, H. J. Koo, D. H. Yang, J. B. Seo, and J. C. Ye, “Cycle consistent adversarial denoising network for multiphase coronary CT angiography,” CoRR, vol. abs/1806.09748, 2018.
  • [48] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” Int. Conf. Learn. Representations. (ICLR), 2016.
  • [49] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 214–223.
  • [50] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imag., vol. 3, no. 1, pp. 47–57, 2017.
  • [51] C. You, Q. Yang, H. Shan, L. Gjesteby, L. Guang, S. Ju, Z. Zhang, Z. Zhao, Y. Zhang, W. Cong, and G. Wang, “Structure-sensitive multi-scale deep neural network for low-dose CT denoising,” IEEE Access, 2018.
  • [52] J. Yamanaka, S. Kuwashima, and T. Kurita, “Fast and accurate image super resolution by deep cnn with skip connection and network in network,” in Proc. NIPS.   Springer, 2017, pp. 217–225.
  • [53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2016, pp. 770–778.
  • [54] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), vol. 1, no. 2, 2017, p. 3.
  • [55] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Machine Learning Res., vol. 15, no. 1, pp. 1929–1958, 2014.
  • [56] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein GANs,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5769–5779.
  • [57] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Adversarial discriminative domain adaptation,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2017.
  • [58]

    C. Li, H. Liu, C. Chen, Y. Pu, L. Chen, R. Henao, and L. Carin, “Alice: Towards understanding adversarial learning for joint distribution matching,” in

    Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5495–5503.
  • [59] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
  • [60] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, no. 17, p. 4777, 2008.
  • [61] G.-H. Chen, J. Tang, and S. Leng, “Prior image constrained compressed sensing (piccs): a method to accurately reconstruct dynamic CT images from highly undersampled projection data sets,” Med. Phys., vol. 35, no. 2, pp. 660–663, 2008.
  • [62] C. R. Vogel and M. E. Oman, “Iterative methods for total variation denoising,” SIAM J. Sci. Comput., vol. 17, no. 1, pp. 227–238, 1996.
  • [63] J. Song, Q. H. Liu, G. A. Johnson, and C. T. Badea, “Sparseness prior based iterative image reconstruction for retrospectively gated cardiac micro-CT,” Med. Phys., vol. 34, no. 11, pp. 4476–4483, 2007.
  • [64] J. Yang, H. Yu, M. Jiang, and G. Wang, “High-order total variation minimization for interior tomography,” Inverse Problems, vol. 26, no. 3, p. 035013, 2010.
  • [65] S. Luo, T. Shen, Y. Sun, J. Li, G. Li, and X. Tang, “Interior tomography in microscopic CT with image reconstruction constrained by full field of view scan at low spatial resolution,” Phys. Med. Biol., vol. 63, no. 7, p. 075006, 2018.
  • [66] M. Lin, Q. Chen, and S. Yan, “Network in network,” Int. Conf. Learn. Representations. (ICLR), 2014.
  • [67] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2010, pp. 2528–2535.
  • [68] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Int. Conf. Learn. Representations. (ICLR), 2015.
  • [69] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
  • [70] C. Chen, X. Zhang, J. Guo, D. Jin, E. M. Letuchy, T. L. Burns, S. M. Levy, E. A. Hoffman, and P. K. Saha, “Quantitative imaging of peripheral trabecular bone microarchitecture using MDCT,” Med. Phys., vol. 45, no. 1, pp. 236–249, 2018.
  • [71] AAPM, “Low dose CT grand challenge,” 2017. [Online]. Available: http://www.aapm.org/GrandChallenge/LowDoseCT/#
  • [72] D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2009, pp. 349–356.
  • [73] D. Liu, Z. Wang, Y. Fan, X. Liu, Z. Wang, S. Chang, X. Wang, and T. S. Huang, “Learning temporal dynamics for video super-resolution: A deep learning approach,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3432–3445, 2018.
  • [74] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2015, pp. 1026–1034.
  • [75] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
  • [76] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Representations. (ICLR), 2015.
  • [77] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3D random noise filtering for absorption optical projection tomography,” Phys. Med. Biol., vol. 55, no. 18, p. 5401, 2010.
  • [78] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [79] H. R. Sheikh, A. C. Bovik, and G. De Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, 2005.
  • [80] D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image analysis,” Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, 2017.
  • [81] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1348–1357, 2018.
  • [82]

    H. Shan, Y. Zhang, Q. Yang, U. Kruger, K. M. Kalra, L. Sun, W. Cong, and G. Wang, “3D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2D trained network,”

    IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, 2018.
  • [83] C.-Y. Yang, C. Ma, and M.-H. Yang, “Single-image super-resolution: A benchmark,” in Eur. Conf. Comp. Vis. (ECCV), 2014, pp. 372–386.
  • [84] S. R. Cummings and L. J. Melton III, “Epidemiology and outcomes of osteoporotic fractures,” Lancet., vol. 359, no. 9319, pp. 1761–1767, 2002.
  • [85] M. Kleerekoper, A. Villanueva, J. Stanciu, D. S. Rao, and A. Parfitt, “The role of three-dimensional trabecular microstructure in the pathogenesis of vertebral compression fractures,” Calcif. Tissue Int., vol. 37, no. 6, pp. 594–597, 1985.
  • [86] E. Legrand, D. Chappard, C. Pascaretti, M. Duquenne, S. Krebs, V. Rohmer, M.-F. Basle, and M. Audran, “Trabecular bone microarchitecture, bone mineral density, and vertebral fractures in male osteoporosis,” J. Bone Miner. Res., vol. 15, no. 1, pp. 13–19, 2000.
  • [87] A. Parfitt, C. Mathews, A. Villanueva, M. Kleerekoper, B. Frame, and D. Rao, “Relationships between surface, volume, and thickness of iliac trabecular bone in aging and in osteoporosis. implications for the microanatomic and cellular mechanisms of bone loss.” J. Clin. Invest., vol. 72, no. 4, pp. 1396–1409, 1983.
  • [88] M. Ding and I. Hvid, “Quantification of age-related changes in the structure model type and trabecular thickness of human tibial cancellous bone,” Bone, vol. 26, no. 3, pp. 291–295, 2000.
  • [89] M. K. Kalra, M. M. Maher, T. L. Toth, L. M. Hamberg, M. A. Blake, J.-A. Shepard, and S. Saini, “Strategies for CT radiation dose optimization,” Radiology, vol. 230, no. 3, pp. 619–628, 2004.
  • [90] D. Marin, R. C. Nelson, S. T. Schindera, S. Richard, R. S. Youngblood, T. T. Yoshizumi, and E. Samei, “Low-tube-voltage, high-tube-current multidetector abdominal CT: improved image quality and decreased radiation dose with adaptive statistical iterative reconstruction algorithm - initial clinical experience,” Radiology, vol. 254, no. 1, pp. 145–153, 2009.
  • [91] P. Prakash, M. K. Kalra, A. K. Kambadakone, H. Pien, J. Hsieh, M. A. Blake, and D. V. Sahani, “Reducing abdominal CT radiation dose with adaptive statistical iterative reconstruction technique,” Invest. Radiol., vol. 45, no. 4, pp. 202–210, 2010.
  • [92] D. M. Vasilescu, Z. Gao, P. K. Saha, L. Yin, G. Wang, B. Haefeli-Bleuer, M. Ochs, E. R. Weibel, and E. A. Hoffman, “Assessment of morphometry of pulmonary acini in mouse lungs by nondestructive imaging using multiscale microcomputed tomography,” Proc. Natl. Acad. Sci. U.S.A., p. 201215112, 2012.
  • [93] K. S. Iyer, J. D. Newell Jr, D. Jin, M. K. Fuld, P. K. Saha, S. Hansdottir, and E. A. Hoffman, “Quantitative dual-energy computed tomography supports a vascular etiology of smoking-induced inflammatory lung disease,” Am. J. Respir. Crit. Care Med., vol. 193, no. 6, pp. 652–661, 2016.
  • [94] M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan, S. Hoyer, and R. Munos, “The cramer distance as a solution to biased wasserstein gradients,” CoRR, vol. abs/1705.10743, 2017.