Computed Tomography (CT) is one of the widely used modalities for cancer diagnostics 21, 15. CT provides a flexible image acquisition and reconstruction protocol that allows adjusting kernel function, amount of radiation, slice thickness, etc. to meet clinical requirements 17. The non-standard protocol setup broadens the scope of CT uses effectively, but at the same time it creates the data discrepancy problem among the acquired images 13. For example, the same clinical observation with two different CT acquisition protocols may result in images with significantly different radiomic features, esp. intensity and texture 3, 9. As a result, this discrepancy hinders the effectiveness of inter-clinic data sharing and the performance of large-scale radiomics studies 3.
The CT image discrepancy problem could be potentially addressed by defining and using the standard image acquisition protocol. However, it is impractical to use the same image acquisition protocol in all the clinical practices, because there are already multiple CT scanner manufactures in the market 20 and using a fixed protocol for all patients under all situations will greatly limit the use of the CT techniques in diagnosis, staging, therapy selection, and therapy assessment of lung malignancies 4. Alternatively, we propose to develop an image standardization and normalization tool to “translate” any CT images acquired using non-standard protocols into the standard one while preserving most of the anatomic details 13. Mathematically, let target image be an image acquired using a standard protocol, given any non-standard source image , the image standardization and normalization tool aims to compose a synthetic image from such that is significantly more similar to than to regarding radiomic features.
. U-Net is a special kind of fully connected U-shaped neural network for image synthesis25. Generative Adversarial Network (GAN) is a class of deep learning models in which two neural networks contest with each other 8. Being one of the mostly-used deep learning architectures for image synthesis, GAN has been extended for CT image standardization 14. In GANai, a customized GAN model is trained using an alternative training strategy to effectively learn the data distribution, thus achieving significantly better performance than the classic GAN model and the traditional image processing algorithm called Histogram matching 26, 10. However, GANai focuses on the relatively easier image patch synthesis problem rather than whole DICOM image synthesis problem 14.
In the CT image standardization and normalization problem, the synthesized data and the target data must have the common feature space 14. This posees two computational challenges of the work: 1) the effective mapping between target images and synthesized images with great pixel-level details, and 2) the texture consistency among the synthesized images. In this paper, to address the critical issues in CT image standardization and normalization, we present an end-to-end solution called STAN-CT. In STAN-CT, we introduce two new constrains in GAN loss. Specifically, we adopt a latent space-based loss for the generator to establish a one-to-one mapping from target images to synthesized images. Also, a feature-based loss is adopted for the discriminator to critic the texture features of the standard and the synthesized images. To synthesize CT images in the Digital Imaging and Communications in Medicine (DICOM) format 18, STAN-CT introduces a DICOM reconstruction framework that can integrate all the synthesized image patches to generate a DICOM file for clinical use. The framework ensures the quality of the synthesized DICOM by systematically identifying and pruning low-quality image patches. In our experiment, by comparing the synthesized images with the ground truth, we demonstrate that STAN-CT significantly outperforms the current state-of-the-art models. In summary, STAN-CT has the following advantages:
STAN-CT provides an end-to-end solution for CT image standardization and normalization. The outcomes of STAN-CT are DICOM image files that can be directly loaded into clinical systems.
STAN-CT adopts a novel one-to-one mapping loss function on the latent space. It enforces the generator to draw sample distribution from the same distribution where the standard image belongs to.
STAN-CT uses a new feature-based loss to improve the performance of the discriminator.
STAN-CT is effective in model training. It quickly converges within a few rounds of training.
CT images are one of the key modalities in lung malignancy studies 24. The problem of CT image discrepancy due to the common use of non-standard imaging protocols poses a gap between CT imaging and radiomics studies. To fill the gap, clinical image synthesis tools must be developed to “translate” CT images acquired using the non-standard protocol into standard images. In the domain of deep learning, generator models and GAN models are the main tools for image synthesis.
Image or data synthesis
U-Net is a special kind of fully connected neural network originally proposed for medical image segmentation 25. Precise localization and relatively small training data requirements are the major advantages of using U-Net 25. A U-Net usually has three parts, down-sampling, bottleneck part and up-sampling. The up-sampling and down-sampling parts are symmetric. There are also connections from down-sampling layers to the corresponding up-sampling layers to add lost feature information during down-sampling. However, while an independent U-net is effective for generating the structural details, it suffers from learning and keeping texture details 22. This issue can be overcome by adopting U-net in a more sophisticated deep generative model called Generative Adversarial Networks (GANs) 8.
Generative Adversarial Networks
Generative Adversarial Networks (GAN) is one of the mostly-used deep learning architectures for data and image synthesis 8. A GAN model normally consists of a generator and a discriminator . The generator (e.g. U-net) is responsible for generating fake data from noise, and the discriminator tries to identify whether its input is drawn from the real data or not. Among all the GANs, cGAN is capable of synthesizing new images based on a prior distribution 19. However, the image features of the the synthesized data and that of the target data may not fall into the same distribution. The vanilla cGAN may not be directly applied directly to address the CT image standardization problem. GANai is a customized cGAN model, in which the generator and the discriminator are trained alternatively to learn the data distribution, thus achieving significantly better performance than the vanilla cGAN model. However, GANai focuses on the relatively easier image patch synthesis problem rather than whole DICOM image synthesis problem.
In generative models, latent space often plays a vital role in target domain mapping. Appropriate latent space learning is crucial for generating high quality data. Disentanglement is an effective metric that provides a deep understanding of a particular layer in a neural network 30. Network disentanglement can assist to uncover the important factors that contribute to the data generation process 7.
Alternative Training Strategy
Model training is one of the most crucial parts of GAN because of the special network architecture (i.e. the generator needs to fool the discriminator while the discriminator tries to detect true data distribution from the false one). In the alternative training mechanism, when one component is in training, the other one remains freeze. Also, each component has a fixed number of training iterations. A variant of alternative training was proposed in 14 named fully-optimized alternative training, where the model training is divided into two phases called G-phase and D-phase. In the G-phase, is fixed, and needs to achieve a certain accuracy or completes the maximum training steps . In the D-phase, is fixed, and needs to achieve a pre-defined performance or reaches a maximum training steps
. When one phase is completed, the other phase will begin. The GANai training will continue until an optimal result is achieved or the maximum epochs are reached. Also, instead of performance competing between a single copy ofand , multiple copies of s and s compete with each other. For example, a needs to fool multiple s before its phase is over. A rollback mechanism is implemented in GANai so that if a component is not able to fool its counterpart within limited steps, it rolls back to the beginning of its phase and starts the training again. This training method has been successfully applied to the CT image standardization problem. In STAN-CT, we will adopt and further advance this training method aiming to achieve better performance.
STAN-CT addresses the long-standing CT image standardization and normalization problem. It consists of a modified GAN model with two new loss functions and a dedicated DICOM image synthesis framework to meet the clinical requirements.
Standardizing CT image patches
Similar to the conventional GAN models, STAN-CT GAN model for standardizing CT image consists of two components, the generator and the discriminator . is a U-shaped network 25 consisting of an encoder and a decoder. Both the encoder and the decoder consist of seven hidden layers. There is a skip connection from each layer of the encoder to the corresponding layer of the decoder to address the information loss problem during the down-sampling. consists of five fully connected convolutional layers. Fig. 1 illustrates the GAN architecture of STAN-CT. Mathematically, let be a standard image and be its corresponding non-standard image. The aim of the generator is to create a new image that has the same data distribution as . Meanwhile, the discriminator determines whether and are from the standard image distribution.
Loss function of : In a GAN model, the performance of and increases accordingly. We propose to adopt two losses for the discriminator training, i.e. the WGAN 2 adversarial loss function to critic the standard and non-standard images and the fetcher-based loss.
Adversarial Loss. WGAN is a stable GAN training mechanism that provides a learning balance between and 5. STAN-CT adopts the WGAN-based adversarial loss of the discriminator defined as:
where is the hyper-parameters of , is the batch size, is the input (non-standard image), and is the corresponding standard image.
Feature-based Loss. In addition to the WGAN-based adversarial loss, STAN-CT introduces a new feature-based loss function . A similar feature-based loss function has been used in 28 to improve the generator diversity. Here, we use the feature space of instead of a secondary pre-trained network to maintain a balanced network (i.e. and are not too strong or too weak compared with other). The feature-based loss is described in Eq 2:
where is the feature extractor and is the volume of the feature space, and is an image generated by and is the target image.
Finally, the total loss of D consists of the WGAN-based loss and the feature-based loss . So, the loss function of is defined as:
where is a wight factor ().
Loss function of : The generator loss consists of three components, i.e. the WGAN-based loss, the latent loss, and the L1 regularization.
WGAN-based loss. The WGAN-based loss is used to improve network convergence. It is defined as:
where represents all the hyper-parameters of , is a source image, and is 1-Lipschitz function, which returns the Earth-Mover (EM) distance from one metric space to another.
Latent loss. In the CT image standardization problem, the anatomical properties should be preserved in generating synthesized images. Inspired by 16
, we propose a new latent-vector-based loss function to enforce one-to-one mapping between the synthesized image and the standard image. Specifically, the latent lossaims to minimize the distance between the latent distribution of the synthesized images and their corresponding standard images.
where stands for the latent vector, , and is its corresponding standard image.
Finally, the total loss of is defined as:
where and are wight factors. is the regularization function.
DICOM Reconstruction Framework
STAN-CT presents a DICOM-to-DICOM reconstruction framework for systematic DICOM reconstruction. While the core of the framework is the GAN model introduced in Section 3, the DICOM-to-DICOM reconstruction framework includes four additional components to facilitate processes such as image patch generation and fusion (see Fig. 2). Note that each component has a unique quality control unit (red diamond box) that ensures the outputs are free from defects.
Step 1. soft tissue image patch generation: The first step of STAN-CT DICOM-to-DICOM image standardization is soft tissue image patch generation. Image patches with size between 100 and 256 are randomly generated using the input DICOM image. An image patch is a soft tissue image patch if at least 70% of the pixels are in the soft tissue range (Hounsfield unit value ranging from -1000 to 900). The process will continue until each soft-tissue image patch contains at least 50% overlapped pixels.
Step 2. standard image patch synthesis: With a trained STAN-CT generator, a soft-tissue image patch obtained in the previous step will be standardized (see Section 3 for details).
Then, the synthesized image patches will be examined by STAN-CT discriminator. If a synthesized image patch can fool the discriminator, it is considered as a qualified synthesized image patch. Otherwise, the synthesized image patch will be discarded. This step ensures the quality of the synthesized image patches.
Step 3. standard DICOM image generation: Given all the qualified synthesized image patches, we first normalize the pixel intensity from gray-scale to the Hounsfield unit using:
where and is the pixel value in Hounsfield unit and gray-scale unit respectively, is a qualified synthesized image patch, and and are the maximum and minimum CT number of a source DICOM.
Meanwhile, with a soft tissue image mask created from the original DICOM images with Hounsfield unit ranging from to , the non-soft tissue parts of the synthesized and normalized image patches will be discarded. Finally, we integrate all the valid soft tissue patches to generate the integrated synthesized images.
The quality of the integrated synthesized images will be checked using a quality control unit, which inspects whether there is any box artifacts or missing values. If some artifacts are identified, the corresponding image patches will be re-integrated by cropping boundary pixels.
Step 4. DICOM image evaluation:
Here, we evaluate DICOM image quality manually and automatically. First, both the synthesized and the original non-standard DICOM image files will be viewed side-by-side by radiologists using a PACS reading workstation. Radiologists will be asked to evaluate image quality, estimate the acquisition protocol, and extract tumor properties. The radiologists’ reports will be used to manually evaluate the quality of the standardized CT images. Meanwhile, with all the synthesized DICOM files generated in the previous step, image texture features will be automatically extracted and compared for performance evaluation.
4 Experimental result
For the training data, we used total of 14,688 CT image slices captured using three different kernels (BL57, BL64, and BR40) and four different slice thicknesses (0.5, 1, 1.5, 3mm) using Siemens CT Somatom Force scanner at the University of Kentucky Medical Center. STAN-CT adopted BL64 kernel and 1mm slice thickness as the standard protocol since it has been widely used in clinical practice. Random cropping was used for the image patch extraction and resized into pixel patches. Data augmentation was done by rotating and shifting image patches. Finally, a total of 49,000 soft-tissue image patches were generated from the CT slices and were used as the training data of STAN-CT.
Two testing data sets were prepared for STAN-CT performance evaluation. Both data sets were captured using Siemens CT Somatom Force scanner at the University of Kentucky Medical Center hospital. The first testing data were captured using the non-standard protocol BR40 and 1mm slice thickness. The second testing data were captured using the non-standard protocol BL57 and 1mm slice thickness. The image patch generation step was the same as that of the training data. Each test data set contains 3,810 image patches.
STAN-CT architecture and hyperparameters
STAN-CT GAN model consists of a U-net with fifteen hidden layers and an FCN with five hidden layers. The kernel is used in the convolutional layer. LeakyRelu 27
is adopted as the activation function in all the hidden layers. Softmax is used in the last layer of FCN. Random weight is used during the network initialization phase. The prediction thresholds for determining fake or real images is 0.01 and 0.99 respectively. Maximum training epochs were set to 100 with a learning rate of 0.0001 with momentum 0.5. A fully optimized alternative training mechanism (the same as GANai) was used for the network training. STAN-CT was implemented in TensorFlow1 on a Linux computer server with eight Nvidia GTX 1080 GPU cards. The model took about 36 hours to train from scratch. Once the model was trained, it took about 0.1 seconds to synthesize and normalize every image patch.
For performance evaluation, we computed five radiomic texture features (i.e. dissimilarity, contrast, homogeneity, energy, and correlation) using Gray Level Co-occurrence Matrix (GLCM). The absolute error of each radiomic texture feature was computed using:
where is the GLCM feature extractor is the synthesized image from STAN-CT and the target image respectively. is the corresponding feature space.
Performance of image patch synthesis
Table 2 shows the absolute error of five GLCM-based texture features of STAN-CT, GANai (the current state-of-the-art model), and two disentangled representation of STAN-CT. In the model named “STAN-CT w/o ”, we discarded from STAN-CT the latent loss function of . In the second one named “STAN-CT w/o ”, we discarded the feature-based loss of from STAN-CT. All the models were tested using kernel BL57 and BR40 with the same slice thickness (). For kernel BL57, STAN-CT and its variants outperformed GANai in all the texture features. For kernel BR40, STAN-CT was significantly better than GANai in four out of five features. The first four generators of each GAN models were selected for further analysis. Fig. 3 illustrates the change of the absolute errors of the five GLCM-based texture features using the generators produced in the first four iterations of alternative training of STAN-CT or GANai. The result indicates that STAN-CT can quickly reduce the errors in the first a few iteration of the alternative training, while no clear trend was observed in the results of GANai.
Performance of DICOM reconstruction
A straightforward patch-based image reconstruction approach has three steps: 1) splitting a DICOM slice into overlapped or non-overlapped image patches; 2) standardizing each image patch; and 3) merging the standardized image patches into one DICOM slice. A common problem in such a patch-based image reconstruction process is image artifacts, such as boundary artifact or inconsistent texture. As shown in Table 2, the straightforward approach has the highest absolute error on all the tested image features.
In STAN-CT, three quality control units were inserted into the framework, each being adopted to address a specific image quality problem. Table 2 shows that STAN-CT achieved significantly better performance than the straightforward method regarding the absolute errors on five selected texture features. Fig. 4 visualized the reconstructed DICOM images using the two methods. The red (green) circle highlights the boundary effect where two image patches were merged (texture inconsistency within a DICOM slice) using the straightforward method. In the same DICOM reconstructed using STAN-CT, no visual artifacts were found according to the radiologist’s report.
Also, we compared STAN-CT with its two variants. The method named “w/ overlapped check” used only the first quality control unit to check whether there were enough overlapped soft-tissue image patches. The method named “w/ real/fake check” used the first two quality control units, which not only checked if there were enough image patches, but also examined whether the image patches were successfully standardized. Table 2 shows that both approaches achieve better results than the straightforward method, but none of is better than STAN-CT, indicating all the three quality control units are critical regarding artifact detection and removal. The standardized DICOM images, along with the corresponding standard images, were reviewed by radiologists at the Department of radiology, University of Kentucky using the picture archiving and communication system (PACS) viewer (Barco, GA, USA). The radiologists, who were blinded to the image reconstruction algorithms, reported that no obvious difference was observed in lung regions between the two kinds of images.
By systematically removing every single component in the GAN model and in the DICOM reconstruction pipeline using the the leave-one-out approach, we analyzed the impact of every component of STAN-CT.
In STAN-CT, both the latent loss and the feature loss are key components. To evaluate the impact of the loss functions, two versions of STAN-CT GAN, where the latent loss or the feature loss has been removed respectively, were created. Table 2 shows that none of them can achieve the same performance as that of STAN-CT regarding the GLCM-based texture features. In addition, Figure 5 shows that the latent loss of STAN-CT decreases during G-training, indicating that the generator can reduce the gap between the distributions of the target image and the synthetic image effectively, while maintaining flat during the D-training phases.
The DICOM reconstruction pipeline includes four quality control units, each contributing to the improvements of the quality of the resulting DICOM images. Table 2 shows that the contrast error of the straightforward DICOM reconstruction (without using any of the quality control units) is 0.727, which can be reduced to 0.485 by adding the overlapped soft tissue quality control, which provides consistent texture throughout the DICOM. It can be further reduced to 0.334 (54% improvement) by adding the discriminator checker that ensures the success of image synthesis. Eventually, if all the four quality control units were used, the contract error was reduced to 0.201 (72% improvement). In summary, our experiments demonstrate that STAN-CT is a robust framework for CT image standardization.
Data discrepancy in CT images due to the use of non-standard image acquisition protocols adds extra burden to radiologists and also creates a gap in large-scale cross-center radiomic studies. We propose STAN-CT, a novel tool for CT DICOM image standardization and normalization. In STAN-CT, new loss functions are introduced for efficient GAN training, and a dedicated DICOM-to-DICOM image reconstruction framework has been developed to automate the DICOM standardization and normalization process. The experimental results show that STAN-CT is significantly better than the existing tools on CT image standardization.
This research is supported by NIH NCI (grant no. 1R21CA231911) and Kentucky Lung Cancer Research (grant no. KLCR-3048113817).
TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Cited by: §4.
- Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §3.
- Radiomics of ct features may be nonreproducible and redundant: influence of ct acquisition parameters. Radiology, pp. 172361. Cited by: §1.
- Effects of ct section thickness and reconstruction kernel on emphysema quantification: relationship to the magnitude of the ct emphysema index. Academic radiology 17 (2), pp. 146–156. Cited by: §1.
- Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777. Cited by: §3.
- A testbed for realistic image synthesis. IEEE Computer Graphics and Applications 3 (8), pp. 10–20. Cited by: §2.
- Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230. Cited by: §2.
- An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469. Cited by: §1, §2, §2, §2.
- High quality machine-robust image features: identification in nonsmall cell lung cancer computed tomography images. Medical physics 40 (12). Cited by: §1.
- Fundamentals of digital image processing. Englewood Cliffs, NJ: Prentice Hall,. Cited by: §1.
A style-based generator architecture for generative adversarial networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410. Cited by: §2.
- Predicting future frames using retrospective cycle gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1811–1820. Cited by: §2.
- Radiomic features of lung cancer and their dependency on ct image acquisition parameters. Medical Physics 44 (6), pp. 3024. Cited by: §1, §1.
- GANai: standardizing ct images using generative adversarial network with alternative improvement. bioRxiv, pp. 460188. Cited by: §1, §1, §2.
- Fundamentals of medical imaging. Medical Physics 38 (3), pp. 1735–1735. Cited by: §1.
- Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1429–1437. Cited by: §3.
- Influence of ct acquisition and reconstruction parameters on radiomic feature reproducibility. Journal of Medical Imaging 5 (1), pp. 011020. Cited by: §1.
- Introduction to the dicom standard. European radiology 12 (4), pp. 920–927. Cited by: §1.
- Conditional generative adversarial nets. arXiv:1411.1784v1. Cited by: §2.
- Relationships of clinical protocols and reconstruction kernels with image quality and radiation dose in a 128-slice ct scanner: study with an anthropomorphic and water phantom. European journal of radiology 81 (5), pp. e699–e703. Cited by: §1.
- Medical imaging signals and systems. Pearson Prentice Hall Upper Saddle River. Cited by: §1.
- Learning and incorporating shape models for semantic segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 203–211. Cited by: §2.
- Generative adversarial text to image synthesis. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48, pp. 1060–1069. Cited by: §2.
- Radiomics: the facts and the challenges of image analysis. European radiology experimental 2 (1), pp. 36. Cited by: §2.
- U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Cited by: §1, §2, §3.
- Histogram specification of 24-bit color images in the color difference (cy) color space. Journal of electronic imaging 8 (3), pp. 290–301. Cited by: §1.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. Cited by: §4.
- Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE transactions on medical imaging 37 (6), pp. 1348–1357. Cited by: §3.
- Retinal image synthesis from multiple-landmarks input with generative adversarial networks. Biomedical engineering online 18 (1), pp. 62. Cited by: §2.
- Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering 19 (1), pp. 27–39. Cited by: §2.