Accurate segmentation of the left ventricle (LV) myocardium in cardiac computed tomography angiography (CCTA) images is a prerequisite for its subsequent quantitative analysis. For instance, segmentations of the myocardium can be used to assess myocardial perfusion[Techasith] or to detect functionally significant coronary artery stenosis[Zreik]. Manual delineation of the myocardium is time-consuming and impractical in clinical practice, especially in cases where annotations in several phases of the cardiac cycle are required. Therefore, automatic segmentation methods have been proposed. These methods mainly include atlas-based approaches[Kirisli]
and machine learning-based voxel classification[Zreik, Payer]. Several of the latter were compared in the recent MICCAI 2017 challenge on multi-modal whole-heart segmentation (MMWHS111http://www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/).
In order to generalize to new and unseen datasets, machine learning-based methods for CCTA segmentation require that training and test images have similar intensity histograms. Although Hounsfield units (HU) in CT are normalized, they can differ substantially from one dataset to another in areas perfused with contrast agent. This can be due to differences in the amount of contrast agent, the flow rate, and the time between administration and acquisition. Furthermore, heart rate, cardiac output, and pathology can lead to differences in contrast enhancement of images. Generalization could be improved by including training images with a range of different HU values for contrast agent. These can be obtained by transformations on already reconstructed images, but also by inclusion of CT images that have been acquired at different x-ray energy levels. CT image acquisition with spectral CT scanners allows for the reconstruction of virtual mono-energetic CT images, which have been previously shown to provide valuable information for segmentation in CT [Chen]. In this work, we investigate whether virtual mono-energetic CT images can be used to improve the generalization of a 3D deep learning segmentation method.
The dataset used in this study consists of 10 CCTA scans acquired on a spectral CT scanner and 40 CCTA scans acquired on a conventional CT scanner. The spectral scans were acquired on a Philips IQon CT scanner (Philips Healthcare, Best, The Netherlands). Images were acquired using a tube voltage of 120 kVp and a tube current of 120 mAs. For each acquisition, a conventional CCTA image was reconstructed, i.e. an image reconstructed from the raw data of the whole x-ray spectrum. Furthermore, virtual mono-energetic images at 60 and 90 keV were reconstructed from the same acquisition. All images were reconstructed to 0.34–0.43 in-plane resolution and 0.90 and 0.45 mm slice thickness and increment, respectively. The conventional scans were acquired with a Philips Brilliance iCT scanner (Philips Healthcare, Best, The Netherlands). A tube voltage of 120 kVp and a weight-dependent tube current between 210 and 300 mAs was used. All images were reconstructed to 0.38–0.56 in-plane resolution and 0.90 and 0.45 mm slice thickness and increment, respectively.
Images acquired on the spectral CT scanner were used as a training set. Reference segmentations of seven cardiac substructures were obtained in each conventionally reconstructed image. These were: LV myocardium, LV blood cavity, right ventricle blood cavity, left atrium, right atrium, ascending aorta, and pulmonary artery. To obtain a reference segmentation in each image, an initial automatic segmentation was obtained based on the MMWHS training set. These segmentations were then manually corrected by voxel painting. As all reconstructed images for a patient were based on the same acquisition and due to the dual-layer design of the scanner, we assumed perfect voxel-wise correspondence between the conventional and virtual mono-energetic images. Therefore, reference segmentations obtained in the conventionally reconstructed images were directly propagated to the corresponding virtual mono-energetic reconstructions. The 40 images obtained on a conventional CT scanner were used as a test set. Unlike in the images acquired with the spectral CT scanner, in each of these images, only the LV myocardium was manually annotated in the short-axis view in a previous study [Zreik].
We trained a 3D fully convolutional network (FCN) architecture to label each voxel in an input image as one of the eight classes. The network architecture is a 3D implementation of the encoder-decoder FCN proposed by Johnson et al. [Johnson]
. It consists of two downsampling layers with strided convolutions, six residual blocks with padded convolutions, and two upsampling layers with transposed convolutions. After each convolutional layer, batch normalization and rectified linear units (ReLU) are applied.
The FCN was trained using mini-batches consisting of eight 128 128
128 patches that were randomly sampled from the training images. For all experiments, the network was trained in 13000 iterations with a sum of soft Dice scores over all classes as the loss function and Adam as the optimizer. An initial learning rate of 0.001 was used, which was reduced by 70% every 4000 iterations. Prior to training or testing, images were resampled to an isotropic resolution of 0.80.8 0.8 and image intensities between -1024 and 3071 were linearly scaled between 0 and 1.
During testing, the network processed full images. The output multi-class probability maps of the network were resampled to the original resolution and each voxel was assigned the class with the highest probability. Given that most clinical scans are acquired using conventional CT scanners that do not provide spectral information, evaluation was performed using the test set containing 40 CCTA images acquired on a conventional CT scanner. This set only contained reference segmentations for the LV myocardium, hence the segmentation performance was assessed using this anatomical structure only. We evaluated agreement between the reference segmentations and automatic segmentations using the Dice similarity coefficient (DSC) and the average symmetrical surface distance (ASSD). The latter was calculated between the largest connected component in the automatic segmentation and the reference segmentation.
4 Experiments and Results
We investigated two data augmentation strategies for the training data. In the first strategy, we linearly scaled all HU values in the conventionally reconstructed images with a factor 0.5 or 2.0 and for each patient added these altered images with their corresponding segmentations to the training set. In the second strategy, we added the two virtual mono-energetic spectral CT reconstructions of each patient with their corresponding segmentations to the training set. To evaluate the efficacy of these strategies, four experiments were performed. In Experiment 1, we trained an FCN using only the conventional reconstructions. In Experiment 2, data augmentation by intensity-based scaling was used. In Experiment 3, data augmentation by inclusion of virtual mono-energetic images was used. Finally, in Experiment 4, results of both strategies were combined and evaluated.
To reflect application to typical CCTA acquisitions, which usually do not include virtual mono-energetic images, we evaluated our experiments on images obtained on a conventional CT scanner. Figure (a)a shows box plots for the DSC obtained on the 40 test scans in all four experiments.
Training using only conventional images (Experiment 1) resulted in an average DSC of and an ASSD of mm. The intensity-based data augmentation (Experiment 2) led to an average DSC of and an ASSD of mm. Augmenting the dataset with two virtual mono-energetic images (Experiment 3) increased the average DSC to and decreased the ASSD to mm. Finally, combining the results of both augmentations (Experiment 4) led to a further increase of the average DSC to and an ASSD of mm. The Friedman test and pair-wise Wilcoxon signed-rank tests with Bonferroni correction were used to validate that the resulting DSC for all four experiments were significantly different from each other (all p-values ). The largest improvements in DSC were gained for images with high HU values in the blood pool as can be seen in Fig. (b)b. The conventional training images only had average HU values between 271 and 343 in the blood pool. Therefore, test cases in which the intensities in the blood pool were increased could benefit the most from an augmentation. Although this figure shows a clear relation between contrast-agent attenuation, data augmentation, and segmentation performance, some issues like anatomical abnormalities could not be solved by these data augmentation approaches.
Figure 2 shows segmentation results for a test image where augmentation improved the segmentation performance.
The reference segmentation is shown in the second column, the segmentations obtained by the four experiments are shown in the subsequent columns. The network trained only on conventional images failed to segment the myocardium in several areas, whereas both types of augmentation resulted in an improved segmentation of the myocardium. However, both methods improved the segmentation of different parts of the myocardium. By combining both predictions, a segmentation closely matching the reference could be obtained. Although both augmentations led to similar improvements in DSC, it is likely that different data augmentations have different local effects.
5 New or Break-Through Work to Be Presented
This work shows that virtual mono-energetic CCTA reconstructions that are standardly acquired along with conventional CCTA reconstructions on a spectral CT scanner can be used as data augmentation for the segmentation of cardiac structures.
The results indicate that using virtual mono-energetic image reconstructions as data augmentation can improve the generalization capabilities of an FCN trained for cardiac segmentation in CCTA. Combining these results with linear intensity scaling augmentation can further improve generalization.
Acknowledgements.This work is part of the research programme Deep Learning for Medical Image Analysis with project number P15-26, which is financially supported by the Netherlands Organisation for Scientific Research (NWO) and Philips Healthcare. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
This work has not been submitted for publication elsewhere.