Besides lung cancer, cardiovascular disease is a leading cause of death in the lung cancer screening population. Moreover, it has been shown that chest CT scans used for lung cancer screening are suitable for identification of participants at risk of cardiovascular disease (CVD)[2, 3, 4, 5]. Previous methods that investigated prediction of CVD events and all-cause mortality used known quantitative CVD image markers and combined them with subject data. Using weighted Cox proportional hazards regression, Chiles et al. showed that quantitative as well as visually assed coronary artery calcium (CAC) scores extracted from screening low-dose chest CT are predictive for CVD and all-cause mortality in the National Lung Screening Trial (NLST). Similarly, Mets et al. used Cox regression based on semi-automatically detected CAC scores and thoracic aorta calcium (TAC) volume as well as subject data, to perform prediction of CVD events and all-cause mortality in the Dutch-Belgian lung cancer screening trial (NELSON). De Vos et al.  performed prediction of CVD events in the same population using a support vector machine (SVM) classifier that employed automatically extracted CAC and TAC scores as features.
These approaches relied on hand-crafted features that are already established as CVD biomarkers. However, besides these known biomarkers, chest CT scans may contain yet unknown features predictive of CVD mortality. Hence, we propose a method based on unsupervised feature learning which is able to automatically predict CVD mortality directly from chest CT scans and is therefore not limited to known quantitative image markers related to CVD.
This study included 1,583 participants of the National Lung Screening Trial. NLST included current and former heavy smokers between the age of 50 and 74. All 395 participants who died of CVD within 5 years from acquisition of the baseline CT scan (non-survivors) were included. In addition, 1,188 participants who were still alive after this period (survivors) were randomly selected.
For each subject a CT scan acquired at baseline was analyzed. Low-dose chest CT scans were made with breath-hold, without contrast enhancement and without ECG synchronization. Scans were acquired using helical scanning mode and a tube voltage of 120 kVp or 140 kVp, depending on the subject’s weight. In-plane resolution ranged from 0.49 mm to 0.98 mm with a slice thickness between 1.0 mm and 2.5 mm. The scans were acquired at 32 different medical centers with 13 different scanner models.
To investigate whether analysis of the heart visualized in chest CT enables prediction of CVD mortality, the method first extracts a bounding box around the heart. This is done with our previously designed and trained algorithm that employs a CNN to determine the presence of the heart in axial, coronal and sagittal slices of the chest CT image and subsequently combines these to define a 3D bounding box around the heart. Thereafter, to ensure equal image resolution in our data set, we resample all extracted heart volumes to isotropic resolution of 1.0 mm. Moreover, to enhance differentiation among soft tissues in the heart (e.g. fat, muscle) and to preserve influence of high density structures (e.g. CAC, TAC), extracted volumes are clipped between [-160, 840] HU.
Because of the relatively low number of the available samples, prediction using e.g. a CNN that would extract features and perform classification of subjects into survivors and non-survivors was not feasible. Therefore, similar to the work of Zreik et al.
, a convolutional autoencoder (CAE) is used to encode the volumes containing the heart in an unsupervised fashion. Thereafter, a conventional machine learning classifier exploiting the extracted encondings is employed to classify subjects into survivors and non-survivors.
The CAE used in this work (Figure 1
) consists of an encoder that compresses the images into representative encodings and a decoder that reconstructs the images during training. The encoder analyzes images, cropped to the heart volume, zero-padded tovoxels and consists of 5 convolutional layers with
kernels. A stride of 2 was applied to achieve spatial downsampling without the need for deterministic spatial functions such as max-pooling. The encoder ends in a dense layer with 100 units, which represents the encodings vector. The decoder consists of 5 upsampling layers and 5 convolutional layers with
kernels and a stride of 1. All convolutional layers are followed by batch normalization and LeakyReLu activation ().
Typically, in training the CAE, the mean squared error (MSE) between the reconstructed image and the original image is used as a loss function. However, to capture the contrast among soft tissues in low-dose CT without intravenous contrast enhancement, in this work the loss function of the CAE is defined employing the feature perceptual loss (FPL), which captures perceptual differences and spatial correlations better. To compute the FPL, both the input image and reconstructed image are separately fed into a fixed VGG16
network pretrained on ImageNet (Figure1). The FPL is then defined as the MSE between the feature maps in this network derived from the input image and the reconstructed image. Because VGG16 is a network designed for 2D images, the loss was calculated over 2D axial slices of the 3D image volumes.
Thereafter, three different classifiers are trained using the encodings obtained with the CAE: a neural network (NN), a random forest classifier (RFC) and a support vector machine (SVM) classifier. For the RFC and SVM a grid search on the validation set is performed to find the optimal parameter settings.
4 Experiments and Results
To assess the performance, eight cross-validation experiments were performed. In each experiment, 100 images were selected as test set, 50 images were selected as validation set and the remaining 1,433 images were used as training set. Test and validation sets were balanced with respect to classes. In the test sets, images of the non-survivors were sampled such that each non-survivor was included once in the test sets. Images of the survivors were randomly selected from the available set.
To augment the training set for the CAE, random rotations around all three image axes were used. The angle of rotation was randomly chosen from a normal distribution with a mean of 0 and a standard deviation of 10 degrees. The CAE was trained in 100,000 iterations, with a batch size of 2 images zero-padded to the required input size. The Adam optimization algorithm was used with a learning rate of 0.001.
To classify subjects into survivors and non-survivors, the classifiers were trained using the encodings obtained with the CAE. The NN (6 units dense layer and 2 units output, dropout p=0.5, categorical crossentropy loss, lr=0.0001) was trained in 25,000 iterations with balanced batches of 100 examples, using the Adam optimization algorithm. The RFC consisted of 75 trees and for the SVM the and were set to 0.0001 and 100, respectively.
To determine the influence of FPL on the reconstructions, we trained an additional CAE with standard MSE as loss function. Using MSE as a loss function resulted in a mean absolute error of 19 ( 5) HU and training with FPL resulted in mean absolute error of 20 ( 6). While the mean absolute reconstruction errors are similar, the CAE trained with FPL learned sharper contrast between structures, as shown in Figure 2A.
In each cross-validation experiment a CAE was trained and the classifiers were evaluated. The performance of the method was assessed with a receiver operating characteristic (ROC) curve. The SVM achieved the best performance with an area under the curve (AUC) of 0.72 ( 0.07 standard deviation, Figure 2B). The ROC of the neural network and the RFC had an AUC of 0.71 ( 0.06) and 0.70 ( 0.06), respectively.
5 Discussion and Conclusion
In this work, a method for prediction of cardiovascular mortality from lung screening chest CT scans has been proposed. Unlike previous predictive models, the proposed method does not use hand-crafted image features, but performs prediction directly from the images containing the heart.
The experiments show that the CAE using FPL preserves structures likely containing important information for prediction of CVD mortality, such as the coronary arteries, aorta, and fat around the heart (Figure 2A). The CAE was trained to reconstruct heart images and was agnostic to the subsequent classification task. It would be interesting to investigate end-to-end training where the CAE is trained while optimizing for a subsequent classification task. Besides likely improvement in the performance by end-to-end training, such an approach might allow identification of image areas important for prediction and thereby allow the confirmation of known and possibly the discovery of novel image markers of CVD mortality. Alternatively, a CNN could be employed to perform classification directly. However, our preliminary experiments showed that end-to-end training and direct classification require a larger data set, which is often not available.
The three evaluated classifiers show a similar performance for prediction of CVD mortality based on the extracted encodings. Furthermore, the presented method shows a similar performance to previous methods proposed by Mets et al. and De Vos et al. (AUC = 0.71)[4, 5], that describe prediction of CVD events. This is remarkable since the CAE is unsupervised and does not incorporate prior knowledge about the image, the subsequent classification task or relevant biomarkers. Moreover, the proposed method achieved similar performance to work by Mets et al. and De Vos et al.[4, 5], without adding subject data like age, smoking history or sex, as they proposed. However, comparison with these methods is somewhat limited by different outcome definitions and populations. Although image analysis without additional subject data may be simpler in application, future work could investigate whether incorporating subject data in the proposed method would improve the performance.
In conclusion, this work demonstrates that the prediction of cardiovascular mortality directly from low-dose screening chest CT scans is feasible. This might allow identification of subjects undergoing lung screening with CT who are at risk of fatal CVD events and who might benefit from preventive treatment.
6 New or Breaktrough Work
A machine learning system is presented that is able to predict cardiovascular death within five years from a low-dose chest CT scan, without prior information about the image or subjects.
Acknowledgements.The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI. The authors hereby state that this work is not submitted elsewhere.
-  National Lung Screening Trial Research Team, “Reduced lung-cancer mortality with low-dose computed tomographic screening,” New England Journal of Medicine 365(5), 395–409 (2011).
-  Jacobs, P. C., Gondrie, M. J., van der Graaf, Y., de Koning, H. J., Isgum, I., van Ginneken, B., and Mali, W. P., “Coronary artery calcium can predict all-cause mortality and cardiovascular events on low-dose CT screening for lung cancer,” American Journal of Roentgenology 198(3), 505–511 (2012).
-  Chiles, C., Duan, F., Gladish, G. W., Ravenel, J. G., Baginski, S. G., Snyder, B. S., DeMello, S., Desjardins, S. S., Munden, R. F., and Team, N. S., “Association of coronary artery calcification and mortality in the national lung screening trial: a comparison of three scoring methods,” Radiology 276(1), 82–90 (2015).
-  Mets, O. M., Vliegenthart, R., Gondrie, M. J., Viergever, M. A., Oudkerk, M., de Koning, H. J., Willem, P. T. M., Prokop, M., van Klaveren, R. J., van der Graaf, Y., et al., “Lung cancer screening CT-based prediction of cardiovascular events,” JACC: Cardiovascular Imaging 6(8), 899–907 (2013).
-  de Vos, B. D., de Jong, P. A., Wolterink, J. M., Vliegenthart, R., Wielingen, G. V., Viergever, M. A., and Išgum, I., “Automatic machine learning based prediction of cardiovascular events in lung cancer screening data,” in [Medical Imaging 2015: Computer-Aided Diagnosis ], 9414, 94140D, International Society for Optics and Photonics (2015).
-  de Vos, B. D., Wolterink, J. M., de Jong, P. A., Leiner, T., Viergever, M. A., and Išgum, I., “Convnet-based localization of anatomical structures in 3-D medical images,” IEEE transactions on medical imaging 36(7), 1470–1481 (2017).
-  Zreik, M., Lessmann, N., van Hamersvelt, R. W., Wolterink, J. M., Voskuil, M., Viergever, M. A., Leiner, T., and Išgum, I., “Deep learning analysis of the myocardium in coronary CT angiography for identification of patients with functionally significant coronary artery stenosis,” Medical Image Analysis 44, 72–85 (2018).
Hou, X., Shen, L., Sun, K., and Qiu, G., “Deep feature consistent variational autoencoder,”
2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 1133–1141, IEEE (2017).
-  Simonyan, K. and Zisserman, A., “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).