Knowledge-based Analysis for Mortality Prediction from CT Images

Recent studies have highlighted the high correlation between cardiovascular diseases (CVD) and lung cancer, and both are associated with significant morbidity and mortality. Low-Dose CT (LCDT) scans have led to significant improvements in the accuracy of lung cancer diagnosis and thus the reduction of cancer deaths. However, the high correlation between lung cancer and CVD has not been well explored for mortality prediction. This paper introduces a knowledge-based analytical method using deep convolutional neural network (CNN) for all-cause mortality prediction. The underlying approach combines structural image features extracted from CNNs, based on LDCT volume in different scale, and clinical knowledge obtained from quantitative measurements, to comprehensively predict the mortality risk of lung cancer screening subjects. The introduced method is referred to here as the Knowledge-based Analysis of Mortality Prediction Network, or KAMP-Net. It constitutes a collaborative framework that utilizes both imaging features and anatomical information, instead of completely relying on automatic feature extraction. Our work demonstrates the feasibility of incorporating quantitative clinical measurements to assist CNNs in all-cause mortality prediction from chest LDCT images. The results of this study confirm that radiologist defined features are an important complement to CNNs to achieve a more comprehensive feature extraction. Thus, the proposed KAMP-Net has shown to achieve a superior performance when compared to other methods.


page 1

page 2

page 3

page 6


Hybrid deep neural networks for all-cause Mortality Prediction from LDCT Images

Known for its high morbidity and mortality rates, lung cancer poses a si...

Direct Prediction of Cardiovascular Mortality from Low-dose Chest CT using Deep Learning

Cardiovascular disease (CVD) is a leading cause of death in the lung can...

Lung Cancer Detection using Co-learning from Chest CT Images and Clinical Demographics

Early detection of lung cancer is essential in reducing mortality. Recen...

Airway measurement by refinement of synthetic images improves mortality prediction in idiopathic pulmonary fibrosis

Several chronic lung diseases, like idiopathic pulmonary fibrosis (IPF) ...

A Meta-Analysis: Air Quality and Lung Cancer Mortality

Lung cancer has become one of the most epidemic and fatal malignant neop...

I Introduction

Low-Dose CT has proven to be effective for lung cancer screening. For example, the National Lung Screening Trial (NLST) observed a 20% decrease in lung cancer related mortality in at-risk subjects (55 to 74 years, 30 pack-year cigarette-smoking history) [1]. The prevalence of lung cancer is highly correlated with CVDs and both are associated with significant morbidity and mortality. More precisely, both share several risk factors that are predominantly attributed to unhealthy dietary habits, obesity and tobacco use etc.

Based on the NLST, Chiles et al. [2] showed that coronary artery calcification (CAC), identified using various methods, is strongly associated with mortality. In a different study - the Dutch-Belgian Randomized Lung Cancer Screening Trial (NELSON) - it was found that CAC can predict all-cause mortality and cardiovascular events on lung cancer screening LDCT [3]. Our previous work [4] has also shown significant difference in CAC scores between the survivor and non-survivor groups, indicating that CAC can predict the mortality risk of lung cancer patients. The existing risk quantification methods, however, rely on directly counting the number of pixels that exceed a certain threshold [5, 6]. Although the calcium size carries valuable information, it may also miss some key mortality indicators, as argued in [4].

Over the past few years, the application of deep learning, a subdomain of machine learning, has led to a series of breakthroughs producing a paradigm shift that resulted in numerous innovations in medicine, ranging from medical image processing, to computer-assisted diagnosis, to health record analysis. Deep learning has also been applied for automatic calcium scoring from chest LDCT images. For example, Lessmann et al. [7] report that (i) deep neural networks can measure the size of CAC from LDCT and (ii) the use of different filters, during the reconstruction process, can influence the quantification results. Training such networks, however, requires manually labeling the area of calcification from images. This results in significant efforts and only a small number of images can be annotated. This may adversely affect the network performance. Moreover, CAC segmentation does reveal other imaging markers that may predict the mortality risk.

Fig. 1: Overview of the proposed KAMP-Net, which combines the clinical knowledge based features and features discovered by the deep learning DSN for improved mortality risk prediction.

Recently, van Velzen et al. [8]

introduced a convolutional autoencoder to extract image features for cardiovascular mortality prediction in a latent space. The features then serve as the input to a separate classifier, for example a neural network, a random forest classifier or a support vector machine, to compute a risk value. However, such a two-phase method may not be able to extract the most distinctive features associated with CVD. Moreover, traditional convolutional neural networks (CNNs) rely on directly extracted image features to perform image classification. This, however, omits clinical knowledge summarized by physicians through their diagnosis. Since various predefined imaging markers have been well recognized as indication of mortality risk, it is advisable to utilize this information for estimating this risk.

This paper hypothesizes that incorporating clinical knowledge into a deep learning based mortality risk prediction produces valuable complementary information which increases the prediction accuracy. To test the hypothesis, we introduce a novel method that combines extracted features from a CNN with clinical knowledge for predicting all-cause mortality risk of lung cancer patients from their LDCT images. More precisely, the method introduced here relies on a dual-stream network (DSN), which takes whole slices as well as cropped cardiac patches as the input for feature extraction. The multi-scale input has, consequently, global image slice information and details of important local areas. The second component of the introduced method is incorporating clinical knowledge that is based on four clinical measurements, including CAC, muscle mass, fat attenuation, and emphysema. Inspired by the work in [9]

, we employ a support vector machine (SVM) classifier to combine the clinical measurements to generate a combined mortality risk probability. The resultant method is referred to here as the knowledge-based analysis for mortality prediction or, in short, KAMP-Net.

The experimental results confirm that the KAMP-Net yields is more accurate in predicting mortality when directly compared to other competitive networks. The network that forms part of the KAMP-Net architecture is the deep residual neural network (ResNet) [10], based on its reported effectiveness in training deep networks for extracting high-level representative imaging markers. Summarizing the contributions of this work:

  1. we utilize deep neural networks for lung cancer patient all-cause mortality risk prediction by automatically discovering imaging features instead of measuring the extent of CAC as a surrogate index [11, 12];

  2. we introduce a new gray-level image color-coding method to efficiently reuse the seminal deep CNN network structures, originally defined for 3-channel RGB natural images;

  3. we combine multiple scale images, i.e. the LDCT slices and cardiac image patches, as the input to the DSN, trained under 1), for both global and local feature extraction; and

  4. our results demonstrate that the DSN-extracted features, when combined with clinical knowledge from pre-defined imaging marker, can achieve a significantly better prediction accuracy than its component technologies alone as well as other competitive network architectures.

This article is organized as follows. Section III-A presents details concerning the component technologies required for the KAMP-Net method introduced in Section II. The experimental results of comparing the KAMP-Net method to competitive network architectures are given in Section III. Finally, Section V provides a concluding summary of this work.

Ii Methods

In this section, we present our proposed method for mortality risk prediction using LDCT images. To accurately predict the all-cause mortality risk of a subject, we propose to combine multiscale heterogeneous features, which are either automatically obtained from the images through training or manually defined by physicians to effectively use their clinical knowledge. The proposed method is coined as the network of Knowledge-based Analysis for Mortality Prediction (KAMP-Net). The overall structure of the proposed network is shown in Fig. 1 and the details are given as follows.

Fig. 2: Two examples of anatomical-information based multi-channel image coding. With the proposed coding scheme, the large intensity range of CT images can be divided into three smaller segments to highlight the important imaging features.

Ii-a Multi-Channel Image Coding

LDCT images are 3D volume data containing information of internal structures such as organs, bones, blood vessels and soft tissue. The value of each voxel varies from -1000 Hounsfield units (HU) to around 2000 HU. Directly suppressing such a large value range into the typical range processed by deep CNN may result in information loss. To make full use of the anatomical information in CT images, we divide the dynamic range of CT number into three segments according to the intensity distribution range of specific tissues of interest. Each LDCT volume can then be decomposed into three different channels by normalizing the intensity segments. Namely, values below -900 HU are extracted and normalized to [0,255] as emphysema-concentrated interval to form the first channel. Similarly, voxels with values in the range of (-900,0] are assigned to the second channel representing normal tissue and fat-concentration intensity interval. CT numbers larger than 300 are typically from very strong calcification so we top off there normalize all the values in (0,300] to form the third channel including both normal tissue and calcification. For visualization purpose, the three channels are mapped to red, blue, and green channels of color image as shown in Fig. 2. After separating different anatomical structures to separate channels, the intensity range of different tissue types throughout the CT slice become more balanced. For instance, the coronary artery calcification in the heart region appearing as bright green no longer suppressing other imaging components like fat or emphysema.

Ii-B Network Design and Implementation

As shown in Fig. 1, the deep neural network consists of two streams, which is referred as dual stream networks (DSN). The upper stream takes an entire LDCT image slice as input and extracts global image features from the input slice. Each slice contains an axial scan with cross-section view and provides an overall picture of status of a subject. The lower stream takes one manually selected image patch as input, which often covers the heart region especially the area with the most significant coronary artery calcification for the subject. The lower stream supplements the upper stream with local detailed visual cues to emphasize the importance of those local regions. The deep residual network (ResNet) [10]

, which is one of the top performing deep CNNs in various computer vision tasks, has been adopted as the backbone of DSN. By using only the convolutional layers of ResNet, image features can be extracted by ResNet-

, where denotes the depth of the network. At the end of the convolutional layers, 512 features are extracted by ResNet-18 and 34, and 2048 features are ResNet-50, 101 and 152, respectively. According to our previous work [13], ResNet-34 achieves the best accuracy in the patch-input network, so we chose to use it as the lower stream’s backbone architecture.

The proposed KAMP-Net was implemented in Python using the open source PyTorch library

[14]. The training loss is defined as the cross-entropy between the prediction probability and ground-truth label as


where N indicates the batch size, is the label of groundtruth of the th sample and is the network-derived probability for class after softmax. Training of the network is completed in two stages. The two streams of DSN are first trained separately in stage one and then combined for fine-tuning in stage two.

In the first training stage, we implemented ResNet using the pre-defined structure provided by Pytorch [14]. Instead of generating probabilities for 1000 classes, the only difference between our network and the original ResNet is that the last fully connected (FC) layer outputs the classification probabilities of two categories: deceased or survived. Both patch-wise and slice-wise networks are trained from scratch using Adam optimizer [15] with initial learning rate of

, which then decays by 0.9 after every five epochs. We chose to train the network from scratch instead of using networks pre-trained on ImageNet data, because there exists large image appearance difference between natural images from ImageNet and the LDCT lung images. Each sample in our dataset has been labeled either 0 (deceased) or 1 (survived) for training and validation.

In the second training stage, we remove the FC layers of the two sub-network streams pre-trained in stage one and combine the convolutional segments to form DSN. The output feature maps of the two sub-networks are concatenated and fed to a new FC layer, which generates two probabilities for survival and death prediction, respectively. The entire DSN with newly added FC layer is trained for another 200 epochs for fine-tuning with again the learning rate of of . As the pre-trained slice-wise and patch-wise networks have already gained the ability to extract informative medical image features, the training of DSN would converge quickly.

Data augmentation has been shown to be an effective approach to improve the performance of deep CNNs [16]. In this paper, operations including random cropping and scaling are used for training the networks. The image patches in the size 161161 pixels are cropped from LDCT images with centers around the locations clicked by the radiologists for computing CAC scores. The input patches are randomly cropped with scaling ratio between 0.6 and 0.8 and resized to 224

224 pixels for network input. All the image patches are then normalized by subtracting the mean intensities and being divided by the standard deviation as three-channel images.

Ii-C Integration of Deep Learning and Clinical Knowledge

To further increase the accuracy of mortality prediction from LDCT images, we propose to combine clinical measurements with deep learning. Although CNNs are very powerful in extracting imaging markers, they lack of logical reasoning and high level intelligence of human experts, which make it difficult for them to figure out connections between seemingly distant concepts. On the other hand, expert defined measurements from the images have been shown to be very useful in our previous work, including emphysema quantification, muscle mass, fat attenuation and coronary artery calcification score. Those measurements contain high-level information and may not be readily grabbed by the CNNs. These knowledge based features can be complementary to what CNNs extract. We thus propose to integrate the two groups of features to achieve more accurate prediction.

However, directly concatenating those measurements with the feature vectors from CNN could have only trivial effects on the prediction results. Since the CNN-extracted feature vector has higher dimensionality (e.g. 512 for ResNet-34) than the clinical measurement (4 in this case), the latter will be overwhelmed after simple concatenation and contribute little to the risk prediction. To balance the contributions of the two groups of features to the final output, we merge the two groups at a later stage after obtaining the initial probabilities. As shown in Fig. 1, a linear SVM classifier with the four clinical measurements as input is trained for mortality prediction. This SVM classifier will produce the probabilities of being deceased or survived

, which add up to 1. On the DSN side, a softmax activation function is used to generate the probability output. The two sets of probabilities are then combined to obtain the overall chance of survival as


where , and are the combined probability, DSN estimated probability, and SVM estimated probability of survival, respectively. The contribution ratio is a weighting parameter in the range of . The probability of death can be computed as .

Iii Experimental Results

This section presents experimental results of applying the KAMP-Net model for mortality risk prediction and provide detailed analysis and comparison of its performance.

Iii-a Materials

All the study data used in this work are from the National Lung Screening Trial (NLST) [17], which are managed by the National Cancer Institute Cancer Data Access System. In this large scale clinical trial, NLST compared LDCT with the chest radiography for lung cancer screening in more than 50,000 current or former smokers who met the various inclusion criteria. Our hypothesis of the study is that the analysis of LDCT images acquired for lung cancer screening can effectively predict the all-cause mortality of the subjects by combining the clinical knowledge and advanced deep learning techniques. In our work, following the same protocol used in [4], 180 subjects (90 survivors, 90 non-survivors) were selected for the study, each group consisting of 49 subjects with stage I, 19 subjects with stage II, and 22 subjects with stage III lung cancers. Each patient went through three LDCT lung cancer screening exams, of which the first LDCT scan of each patient is used in this study. The survival label is used as the ground truth for training and evaluating the prediction algorithms.

Iii-B Performance Evaluation

A ten-fold cross validation scheme was applied to our dataset for evaluating the performance of the proposed method and other comparative methods. Since we aim to predict the ending points of subjects to be either “survivor” or “nonsurvivor” at the end of the follow-up period, receiver operating characteristic (ROC) curves are drawn to demonstrate the performance. Area under the curve (AUC) scores are used to compare the performances of different methods. When training the networks for each fold, the maximum number of epochs is set to be 200.

Fig. 3: Effects of varying DSN ratio with increment of 0.05.

To further evaluate the mutual influence between image-extracted features and the clinical information on the performance of the KAMP-Net, we vary the DSN weight parameter between 0 to 1 in increment steps of 0.05. Starting from 0, an increase of resulted in a steady increase of the AUC value for the KAMP-Net model. This increase flattens for -values between 0.4 to 0.6, which is followed by declining AUC values if the parameter increases further. This trend of the -AUC curve explicitly implies that, while adding more information from clinical knowledge, the collaboration framework’s prediction can achieve higher accuracy to some extent. The presence of the peak, on the other hand, indicates that there is a delicate balance point where the votes from the individual DSN and SVM techniques can be combined to reach the optimal risk prediction accuracy. At this balance point, the DL-based image features and the medical information from clinical measurements are complementing each other by mutually incorporating missing clues that clearly improves the prediction of a patient’s risk status.

Fig. 4: Ten-fold cross validation ROC curves and AUC values of HyRiskNet-34, DSN, SVM and the proposed KAMP-Net.

The mean ROC curves over ten-fold cross validation of different methods are shown in Fig. 4. The corresponding mean AUC scores and standard deviations are also provided. The SVM model here is trained from four kinds of clinical measurements on the same cross validation fold as the DL methods. It can be seen that our proposed KAMP-Net achieves both the highest AUC score and the lowest standard deviation compared to other methods. Our previous work HyRiskNet [13] is included for comparison, which directly concatenates one additional CAC risk score with the high-dimensional deep CNN extracted feature vector.

We now contrast the performances KAMP-Net model with that of its individual components, i.e. the DSN and the SVM models. Fig. 4 allows comparing the performance of the three models graphically based on estimated ROC curves. For KAMP-Net, we select . To qualitatively test whether the increase of the AUC value is statistically significant, we test the null hypotheses that AUC = AUC and AUC = AUC against the one sided alternative hypotheses AUC AUC and AUC AUC, respectively. The two tests rely on three samples that store AUC values obtained from the previous 10-fold cross-validatory assessment; one sample stores the 10 AUC values for the KAMP-Net model, whilst the other two samples store the AUC values for the DSN and SVM models. This allows a pairwise comparison involving 10 pairs of values for testing AUC = AUC as well as AUC = AUC

. First, we confirmed that the two sample differences were drawn from normal distributions by applying the Anderson-Darling test and then applied a standard paired t-test 


. In both cases, the null hypothesis was rejected and, therefore, concluded that the increase in the risk prediction accuracy by the KAMP-Net model is statistically significant.

It should be noted that even without using any clinical measurements, the current DSN has already outperformed the previous CNN based methods presented in [13], which use only patch image information as input. On the other hand, the performance of SVM shows that these four clinical measurements carry quantification information strongly associated with survival in our experiments. However, it is only a limited set of measurements. When being complemented with deep CNN discovered features, the performance has become even better.

Fig. 5: CAMs for the true positive and true negative predictions together with t-SNE visualization for DSN feature vectors.
Network Color-Coding Grey-Scale
Slice-18 0.6754 0.11 0.6278 0.09
Slice-34 0.6568 0.10 0.6251 0.07
Slice-50 0.7001 0.12 0.6384 0.07
Slice-101 0.6671 0.06 0.6360 0.08
Slice-152 0.6773 0.13 0.6110 0.06
Patch-18 0.6596 0.09 0.6872 0.09
Patch-34 0.7010 0.13 0.6969 0.08
Patch-50 0.6604 0.08 0.6786 0.09
Patch-101 0.6689 0.10 0.6610 0.10
Patch-152 0.6917 0.10 0.6771 0.07
TABLE I: Comparison of the performances of all ResNets with color-coded and grey-scale LDCT image inputs, respectively.

Iii-C Effectiveness of Color-Coding

In this paper, we introduce the color-coding scheme to highlight the anatomical difference for more effective feature extraction. To evaluate the performance, we conducted experiments on all the ResNet network structures using both the original LDCT image and the color-coded version. The experimental results are shown in Table. I. The only difference among the networks between two groups presents in the input layer. While the networks in Color-Coding group take the 3-channel pre-processed images as input, the networks in the other group just take the single-channel grey scale images as input. Such original images are obtained through directly suppressing the raw slices from LDCT 3D volume to the range [0,255] from a wide range of Hounsfield Units. Both groups of networks were trained on the same 10-cross validation fold, with the same training strategy and parameters.

To statistically analyze the significance of color-coding, we applied a paired hypothesis test for the two groups of observations. Prior to that, we verified that the sample differences were drawn from a normal distribution by applying the well-known Anderson-Darling test. This allowed the use of the standard -test [18] for the null hypothesis, which stated that the use of color-coded images does not affect the overall performance compared to gray-scale images, against the one-sided alternative hypothesis that color-coding increases the prediction accuracy when compared to gray-scale images. The computed -value of the slice-wise section maps to the rejection region and we, therefore, rejected the null hypothesis, which confirms that the use of color-coding led to a statistically significant improvement in the mortality prediction accuracy. This indicates that directly suppressing a whole slice from a large dynamic range to generate input for the networks may result in significant loss of information. Conversely, the introduced color-coding scheme alleviates this problem. In contrast, however, there is no significant difference between the color-coding group and the grey scale group when applying pre-processing to the patch-wise networks. In summary of the results in Table I, we select ResNet-50 and ResNet-34 as the backbone networks for the color-coded input slices and patches for DSN in KAMP-Net, respectively.

Method AUC STD
DSN-scratch 0.6455 0.06
Slice-50 0.7001 0.12
Patch-34 0.7010 0.13
DSN 0.7181 0.06
TABLE II: Comparison showing the effectiveness of DSN.

Iii-D Evaluation of Dual Stream Network

We then evaluate the performance of DSN by comparing the network structures as well as training strategies. Table II shows the ROC curves and also AUC values of DSN trained from scratch (SDN-scratch), Slice-50, Patch-34 and DSN. It can be seen that DSN outperforms both Slice-50 and Patch-34 by combining them together and fine-tuning. This indicates that the slice- and patch-networks actually contain complementary information for each other, which leads to improved performance in the final mortality risk prediction. It is also interesting to see that DSN also outperforms the version trained from scratch by 10.8% in terms of AUC score. That may be due to the difficulties in training the large concatenated network. The superior performance of our proposed DSN demonstrates the importance of having both well designed networks and good training strategy.

Iii-E Feature Visualization

To help understand the features extracted by DSN, we compute the class activation map (CAM) by averaging the feature maps from the patch-wise network with the corresponding weights of from the last FC layer as in [19]. We also used t-SNE to reduce the dimensionality of the feature maps to 2D for visualization [20]. Fig. 5 shows the projection of validation samples from a randomly selected fold of the ten-fold cross validation scheme into 2D using t-SNE. From the point scattering shown in this figure, we can see that the positive and negative samples are roughly separated from each other, which indicates that DSN has the capability in extracting image features from LDCT images, which are strongly associated with the subject mortality.

In Fig. 5, we also include several examples with CAMs superimposed on the gray scale images as heatmaps. The closer to red in the heatmaps, the stronger activation there is in the original image, which indicates that information from that area contributes more to the final decision. As it can be seen from Fig. 5, the heatmaps for the deceased subjects predicted correctly by KAMP-Net tend to have strong activation over the coronary artery area in LDCT image patches, especially over the bright calcification region. This finding matches with the clinical literature that CAC is one of the major risk factors for mortality [2]. For survived subjects, the heatmaps suggest that KAMP-Net looks more at surrounding lung tissue and muscles as suggested by our previous work in [4].

Iv Discussions

Fig. 6: Performance comparison of various methods on all-cause mortality risk prediction.

The developed KAMP-Net is then compared against several other clinically used scoring methods for further validation. The results are shown in Fig. 6. It can be seen that the traditional semi-automatic methods, such as Agatston score [5], Agatston risk, volume score [6], and square root of volume score [21], perform similarly and the mean AUC values are all at 0.61 or 0.62, slightly better than random guess. It is interesting to see that the visual inspection of CAC by radiologists outperforms the semi-automatic CAC scoring methods with AUC=0.64. This suggests that some information about the condition of cardiovascular vessels is not captured by those scoring methods, but has been taken into account by the radiologists.

The significant performance improvement comes from the proposed KAMP-Net as shown in Fig. 6. The deep CNNs in DSN successfully extract and quantify features in cardiac patches and slices from chest LDCT images for all-cause mortality prediction, which couldn’t be directly measured by radiologists. The proposed KAMP-Net (with ) achieves the best performance with AUC=0.82, which improves the prediction performance by 28.1% over the visual inspection of radiologists.

V Conclusions

This paper has shown that the patch-based and slice-based deep CNNs can complement each other in feature extraction for all-cause mortality prediction. Furthermore, incorporating the clinical measurements made by radiologists and summarized by a SVM model has yielded a significant performance improvement. This has led to the introduction of a novel method that combines the use of CNNs and a SVM models, which we have shown to produced a symbiosis effect.

In our future work, we propose to investigate the features extracted by the deep CNNs to gain better understanding of the approach. We believe that will allow us to design a better network structure and improve the prediction accuracy further. Moreover, as the clinical measurements being used in this work were semi-automatically extracted and quantified, we consider developing novel methods for automatically obtaining those statistics. Finally, although the color-coding pre-processing of LDCT images has shown to be beneficial, the current thresholds and channel arrangement were manually set, a process that can be automated.

Vi Acknowledgments

The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI. The authors would also like to thank NVIDIA Corporation for the donation of the Titan Xp GPU used for this research.


  • [1] National Lung Screening Trial Research Team, D. R. Aberle, A. M. Adams, C. D. Berg, W. C. Black, J. D. Clapp, R. M. Fagerstrom, I. F. Gareen, C. Gatsonis, P. M. Marcus, and J. D. Sicks, “Reduced lung-cancer mortality with low-dose computed tomographic screening,” The New England Journal of Medicine, vol. 365, no. 5, pp. 395–409, Aug. 2011.
  • [2] C. Chiles, F. Duan, G. W. Gladish, J. G. Ravenel, S. G. Baginski, B. S. Snyder, S. DeMello, S. S. Desjardins, R. F. Munden, and NLST Study Team, “Association of coronary artery calcification and mortality in the national lung screening trial: A comparison of three scoring methods,” Radiology, vol. 276, no. 1, pp. 82–90, Jul. 2015.
  • [3] P. C. Jacobs, M. J. A. Gondrie, Y. van der Graaf, H. J. de Koning, I. Isgum, B. van Ginneken, and W. P. T. M. Mali, “Coronary artery calcium can predict all-cause mortality and cardiovascular events on low-dose ct screening for lung cancer,” American Journal of Roentgenology, vol. 198, no. 3, pp. 505–511, Mar. 2012.
  • [4] S. R. Digumarthy, R. De Man, R. Canellas, A. Otrakji, G. Wang, and M. K. Kalra, “Multifactorial analysis of mortality in screening detected lung cancer,” Journal of Oncology, vol. 2018, p. 7, 2018. [Online]. Available:
  • [5] A. S. Agatston, W. R. Janowitz, F. J. Hildner, N. R. Zusmer, M. Viamonte, and R. Detrano, “Quantification of coronary artery calcium using ultrafast computed tomography,” Journal of the American College of Cardiology, vol. 15, no. 4, pp. 827–832, Mar. 1990.
  • [6] T. Q. Callister, B. Cooil, S. P. Raya, N. J. Lippolis, D. J. Russo, and P. Raggi, “Coronary artery disease: improved reproducibility of calcium scoring with an electron-beam CT volumetric method.” Radiology, vol. 208, no. 3, pp. 807–814, Sep. 1998.
  • [7] N. Lessmann, B. van Ginneken, M. Zreik, P. A. de Jong, B. D. de Vos, M. A. Viergever, and I. Isgum, “Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 615–625, Feb. 2018.
  • [8] S. G. M. van Velzen, M. Zreik, N. Lessmann, M. A. Viergever, P. A. de Jong, H. M. Verkooijen, and I. Išgum, “Direct Prediction of Cardiovascular Mortality from Low-dose Chest CT using Deep Learning,” arXiv:1810.02277 [cs], Oct. 2018, arXiv: 1810.02277. [Online]. Available:
  • [9] H. Fu, S. Xu, Yanwu abd Lin, D. W. K. Wong, B. Mani, M. Mahesh, A. Tin, and J. Liu, “Multi-context deep network for angle-closure glaucoma screening in anterior segment OCT,” in Medical Image Computing and Computer Assisted Intervention (MICCAI), Oct. 2018, pp. 356–363.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , Jun. 2016, pp. 770–778.
  • [11] J. Shemesh, “Coronary artery calcification in clinical practice: what we have learned and why should it routinely be reported on chest CT?” Annals of Translational Medicine, vol. 4, no. 8, Apr. 2016.
  • [12] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Automatic Coronary Calcium Scoring in Cardiac CT Angiography Using Convolutional Neural Networks,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, ser. Lecture Notes in Computer Science.   Springer, Cham, Oct. 2015, pp. 589–596.
  • [13] P. Yan, H. Guo, G. Wang, R. De Man, and M. K. Kalra, “Hybrid deep neural networks for all-cause mortality prediction from LDCT images,” arXiv:1810.08503 [cs.CV], Oct. 2018.
  • [14] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS 2017 Workshop Autodiff, 2017.
  • [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Representations (ICLR), Dec. 2014.
  • [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
  • [17] J. Chin, T. Syrek Jensen, L. Ashby, J. Hermansen, J. D. Hutter, and P. H. Conway, “Screening for Lung Cancer with Low-Dose CT — Translating Science into Medicare Coverage Policy,” New England Journal of Medicine, vol. 372, no. 22, pp. 2083–2085, May 2015.
  • [18] D. C. Montgomery and G. C. Runger, Applied Statistics and Probability for Engineers, 5th Edition.   Hoboken, NJ: John Wiley & Sons, 2010.
  • [19]

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in

    Computer Vision and Pattern Recognition (CVPR), Jun. 2016.
  • [20] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579–2605, 2008.
  • [21] J. E. Hokanson, T. MacKenzie, G. Kinney, J. K. Snell-Bergeon, D. Dabelea, J. Ehrlich, R. H. Eckel, and M. Rewers, “Evaluating Changes in Coronary Artery Calcium: An Analytic Method That Accounts for Interscan Variability,” American Journal of Roentgenology, vol. 182, no. 5, pp. 1327–1332, May 2004.