Non-small-cell lung cancer (NSCLC) represents approximately 80-85
cancer diagnoses and is the leading cause of cancer-related death worldwide.
Recent studies indicate that image-based radiomics features from positron
emission tomography-computed tomography (PET/CT) images have predictive power
on NSCLC outcomes. To this end, easily calculated functional features such as
the maximum and the mean of standard uptake value (SUV) and total lesion
glycolysis (TLG) are most commonly used for NSCLC prognostication, but their
prognostic value remains controversial. Meanwhile, convolutional neural
networks (CNN) are rapidly emerging as a new premise for cancer image analysis,
with significantly enhanced predictive power compared to other hand-crafted
radiomics features. Here we show that CNN trained to perform the tumor
segmentation task, with no other information than physician contours, identify
a rich set of survival-related image features with remarkable prognostic value.
In a retrospective study on 96 NSCLC patients before stereotactic-body
radiotherapy (SBRT), we found that the CNN segmentation algorithm (U-Net)
trained for tumor segmentation in PET/CT images, contained features having
strong correlation with 2- and 5-year overall and disease-specific survivals.
The U-net algorithm has not seen any other clinical information (e.g. survival,
age, smoking history) than the images and the corresponding tumor contours
provided by physicians. Furthermore, through visualization of the U-Net, we
also found convincing evidence that the regions of progression appear to match
with the regions where the U-Net features identified patterns that predicted
higher likelihood of death. We anticipate our findings will be a starting point
for more sophisticated non-intrusive patient specific cancer prognosis
Use of state of the art Convolutional neural network architectures including 3D UNet, 3D VNet and 2D UNets for Brain Tumor Segmentation and using segmented image features for Survival Prediction of patients through deep neural networks.
According to World Health Organization (WHO), lung cancer remains the most common, leading cause of cancer-related death worldwide with 2.1 million new cases diagnosed and 1.8 million deaths in 2018. NSCLC accounts 80-85% of lung cancer diagnoses and five-year survival rate of NSCLC remains relatively low (23%), compared to other leading cancer sites such as colorectal (64.5%), breast (89.6%), and prostate (98.2%). Historically, the tumor, nodes, and metastases (TNM) staging system has served as the major prognostic factor in predicting therapeutic outcomes, but it does not differentiate responders and non-responders in the same stage. The maximum and the mean of standard uptake values (SUVMAX and SUVMEAN) have been reported for their correlation with survival[6, 7, 8] but are of limited clinical value due to their unsatisfactory predictive power and lack of robustness[14, 15]. Other prognostic markers have also been studied, including TLG, which incorporates metabolic tumor volume (MTV) and metabolic activity (TLG = MTV ×SUVMEAN). Reports[11, 12, 13] suggest that TLG may have better prognostic power than SUVMAX or SUVMEAN. These metrics, however, are not optimal and do not provide a comprehensive image-based analysis of tumors. More recently, radiomics approaches, which employ semi-automated analysis based on a few hand-crafted imaging features describing intratumoral heterogeneity, demonstrated higher prognostic power[22, 23]. However, these features still have limited predictive power ranging between 0.5 and 0.79 in terms of the area under the curve (AUC)[23, 24, 25]. Recent literature in deep learning demonstrates its strong potentials in cancer prognostication[16, 26], however the clinical implications of deep learning remain questioned due to the limited interpretability of CNNs.
Here, we propose an interpretable and highly accurate framework to solve this problem by capitalizing on the unprecedented success of deep convolutional neural networks (CNN). More specifically, we investigate the U-Net, a convolutional encoder-decoder network that has demonstrated exceptional performance in tumor detection and segmentation tasks. Illustrated in Fig, 1a, these networks take a three-dimensional (3D) volume image as an input, processes it through a “bottleneck layer” where the image features are compressed, and reconstructed into a binary segmentation map indicating a pixel-wise tumor classification result. Here, we focused on the information encoded at the bottleneck layer which contains rich visual characteristics of the tumor and reasoned that the encoded information at this layer might be relevant to the tumor malignancy and, thus, cancer survival, which is the central hypothesis of this paper.
In a prior study[27, 28], we analyzed PET/CT images of 96 non-small cell lung cancer (NSCLC) patients that were obtained within 3 months prior to stereotactic body radiation therapy (SBRT), whose summary statistics are illustrated in Fig. 1b. For each volume image, the region of interest (ROI) with a dimension of 96 mm × 96 mm × 48 mm was set around each tumor location and the image was cropped to the ROI volume. Two separate U-Net models were trained to perform tumor segmentation in PET and CT images, respectively. Each of the models were supervised with the corresponding physician contours, but no other information such as survival time was provided. After training, each U-Net model learned to encode 55,296 features at the bottleneck layer for each patient, resulting a total of 110,592 features per patient.
These features are an intermediate throughput of the U-Net, which are then decoded to generate an automated segmentation in the network. Hence, it is likely that these features summarize some rich structural and functional geometry of the intratumoral and peritumoral area, some of which might be relevant to cancer survival. To test this proposition, we conducted a two-sample t-test and examined if there were any statistically significant features from the U-Net that distinguish the survival and death groups. The t-test analysis was conducted for four different categories of survival separately, namely 2-year overall survival (2-yr. OS), 5-year overall survival (5-yr. OS), 2-year disease-specific survival (2-yr. DS), and 5-year disease-specific survival (5-yr. DS). The analysis revealed that there were on average 3,042 features in CT and 2,908 features in PET that had the p-value below 0.05, as illustrated in Fig. 2a. Moreover, the analysis revealed that there was a group of 299 features in CT and 292 features in PET that were commonly observed across the four survival categories as illustrated in Fig. 2b, suggesting that there were indeed strong survival-related markers in the U-Net learned features.
We therefore conducted a separate analysis to select features via the least absolute shrinkage and selection operator (LASSO). The analysis was divided into four independent experiments and, for each of the experiments, LASSO attempted to select a few features that have a strong relationship with one of the four survival categories using the linear logistic regression model. For more rigorous selection of features, the inclusion probability was computed via bootstrapping, instead of one-shot selection. In the descending order of inclusion probability, we selected the top 20 features for each survival category, which resulted, in total, 73 features in CT and 56 features in PET across all categories, without double-counting the intersection. The top 20 features had the inclusion probability greater than 0.3 and they reported noticeably smaller
p-values than the other features as illustrated in Fig. 2c, which reconfirms the existence of strong survival-related radiomic markers in the U-Net learned features.
Here, it is worth reemphasizing that the U-Net was trained without any survival-related information and, hence, it is highly unlikely that the U-Net-learned features were overfitted to the survival data or biased towards them. Yet, while these U-Net features were identified independently from survival data, the U-Net features demonstrated a strong evidence of correlation and hence prognostic power for NSCLC survival as discussed above. To quantify this, we performed linear logistic regression on each of the four survival categories and measured the accuracy, sensitivity, specificity, and receiver operating characteristics (ROC). The top 20 LASSO-selected features were used as independent variables in each category and no other covariates provided. For rigorous validation, bootstrapping was employed to compute the 95% confidence interval (CI) of each performance measure. Each bootstrap sample was further split into a training set and test set and reported in Fig.
3 as the average values across test sets in different bootstrap samples. The contrast in performance was clear between the U-Net features and the conventional imaging features, proving the strong prognostic power of the U-Net features quantitatively.
Meanwhile, the LASSO-selected U-Net features were further studied to shed a light on their clinical implications and intuitive meanings. We first tested the correlation between the LASSO-selected U-Net features and the conventional radiomic features to see if the U-Net features were capturing survival-related markers that were previously known to be effective. We found features C16704 and P15398 had some correlation with TLG (R2=0.25, 0.28 and p=0.02, 0.01, respectively), where we name the U-Net features with a prefix ‘C’ or ‘P’ to indicate which imaging modality (CT or PET) they come from followed by their indices. We also found that there were 16 features in CT and 18 features in PET that had noticeable correlation (p<0.05) with the 17 conventional radiomic features defined as in Oikonomou et al. This might indicate that the U-Net features somehow capture the aspects of conventional radiomic features, while having a substantially larger amount of additional information coded into them.
On the other hand, these U-Net features are essentially artificial neurons in deep neural networks. Hence, we may develop some additional insight on the U-Net features by visualizing which patterns the corresponding neurons are looking for. Intuitively, one can show many different three-dimensional image patterns to the U-Net encoder and observe which image pattern activates each neuron the most. To facilitate this process, we employed an optimization-based approach
 where the objective is to maximize an individual neuron’s activation value by manipulating the input image pattern:
where q(⋅|W,b) is the U-Net encoder with the trained model parameters W and b, and X is the input image pattern. Displayed in Fig. 4a are different image patterns that activated the survival-related U-Net features. Many of the U-Net features appear to be capturing tumor-like blobs (e.g. C00048, C25988, P39051, P47258) or textural characteristics (e.g. C08680) in the image. Interestingly, some of the U-Net features, for example C01777 and C37399, were looking for tube-like structures nearby the tumor-like blobs, which might be capturing blood vessels and lymphatics in the peritumoral area. This is, indeed, consistent with the widely accepted clinical knowledge that tumors can show enhanced growth towards vessels and lymphatics nearby as they carry nutrition to supply the tumoral growth.
Moreover, we also visualized which regions in the patient images predicted low survival probability. We employed a guided gradient backpropagation approach
. The main idea of the guided backpropagation algorithm is to compute ∂P∂xi,j,k where P is the probability of death and xi,j,k is a voxel value at the position (i,j,k) in the patient image. The gradient ∂P∂xi,j,k can be interpreted as the change of the death probability when the voxel xi,j,k changes to a different value. If the voxel was not so significant in predicting death, the gradient value would be small, where as if the voxel played an important role for predicting high probability of death, the gradient value would be greater. Displayed in Fig. 4b are heatmaps representing the gradient. Heated regions (red) are the areas that lowered the probability of survival whereas the other areas (blue) are the ones that had negligible effect on the survival. In all cases, tumoral regions were highlighted in red, which might be trivial. However, through comparison with post-therapeutic images and clinical records of the patients, we observed that some of these heated regions outside of the tumoral volume overlap with the regions of progressions (see Fig. 5), demonstrating a convincing potential of the visualization method for the purpose of patient-tailored therapeutic planning in the future. However, this awaits a more rigorous and quantitative follow-up.
In summary, we discovered that the U-Net segmentation algorithm trained for automated tumor segmentation on PET/CT was codifying rich structural and functional geometry at the bottleneck layer such that these codified features could be used for survival prediction in cancer patients even though the U-Net was trained without any survival-related information. The survival model based on such U-Net features demonstrated significantly higher predictive power than conventional PET-based, metabolic burden metrics such as TLG or relatively recent hand-crafted radiomics approaches. The validity of such discovery was confirmed by several statistical tests. Furthermore, we visualized the survival-related U-Net features and observed that they were indeed depicting intratumoral and/or peritumoral structures that had been previously acknowledged as potentially relevant to cancer survival. Our approach awaits a further validation against a larger number of observations and in a larger variety of cancer types. Also, there was not enough clinical evidence to conclude that the visualization of the U-Net features may identify potential regions of recurrence and metastasis and, thus, a follow-up study is suggested. However, our findings may be a new starting point for quantitative image-based cancer prognosis with a great deal of potentially important new knowledge yet to be discovered.
Research reported in this publication was supported by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) under award number 1R21CA209874 and partially by U01CA140206 and P30CA086862.
Satoh, Y., Onishi, H.,
Nambu, A. & Araki, T.
Volume-based parameters measured by
using FDG PET/CT in patients with stage I NSCLC treated with
stereotactic body radiation therapy: Prognostic value.
275–281, DOI: 10.1148/radiol.13130652
Oikonomou, A. et al.Radiomics analysis at PET/CT
contributes to prognosis of recurrence and survival in lung cancer treated
with stereotactic body radiotherapy.
Scientific reports8, 4003,
DOI: 10.1038/s41598-018-22357-y (2018).
de Jong, E. E. et al.Applicability of a prognostic
CT-based radiomic signature model trained on stage I-III non-small cell
lung cancer in stage IV non-small cell lung cancer.
6–11, DOI: 10.1016/j.lungcan.2018.07.023
Berghmans, T. et al.Primary tumor standardized uptake
value (SUVmax) measured on fluorodeoxyglucose positron emission tomography
(FDG-PET) is of prognostic value for survival in non-small cell lung
cancer (NSCLC): A systematic review and meta-analysis (MA) by the
european lung cancer working party for the IASLC lung cancer staging
Journal of Thoracic Oncology3, 6 – 12,
Paesmans, M. et al.Primary tumor standardized uptake
value measured on fluorodeoxyglucose positron emission tomography is of
prognostic value for survival in non-small cell lung cancer: Update of a
systematic review and meta-analysis by the European Lung Cancer Working
Party for the International Association for the Study of Lung Cancer
Journal of Thoracic Oncology5, 612 – 619,
Bollineni, V. R., Widder, J.,
Pruim, J., Langendijk, J. A. &
Wiegman, E. M.
Residual 18F-FDG-PET uptake 12
weeks after stereotactic ablative radiotherapy for stage I non-small-cell
lung cancer predicts local control.
International Journal of Radiation
e551 – e555,
Larson, S. M. et al.Tumor treatment response based on
visual and quantitative changes in global tumor glycolysis using PET-FDG
imaging: The visual response score and the change in total lesion
Clinical Positron Imaging2, 159 – 171,
Liao, S. et al.Prognostic value of metabolic tumor
burden on 18F-FDG PET in nonsurgical patients with non-small cell lung
European Journal of Nuclear Medicine and
Molecular Imaging39, 27–38,
DOI: 10.1007/s00259-011-1934-6 (2012).
Chen, H. H., Chiu, N.-T.,
Su, W.-C., Guo, H.-R. &
Prognostic value of whole-body
total lesion glycolysis at pretreatment FDG PET/CT in non-small cell lung
Zaizen, Y. et al.Prognostic significance of total
lesion glycolysis in patients with advanced non-small cell lung cancer
European Journal of Radiology81, 4179 – 4184,
Imaging in Acute Chest Pain.
Mehta, G., Chander, A.,
Huang, C., Kelly, M. &
Feasibility study of FDG
PET/CT-derived primary tumour glycolysis as a prognostic indicator of
survival in patients with non-small-cell lung cancer.
Clinical Radiology69, 268 – 274,
Burdick, M. J. et al.Maximum standardized uptake value
from staging FDG-PET/CT does not predict treatment outcome for early-stage
non–small-cell lung cancer treated with stereotactic body radiotherapy.
International Journal of Radiation
1033 – 1039,
Agarwal, M., Brahmanday, G.,
Bajaj, S. K., Ravikrishnan, K. P. &
Wong, C.-Y. O.
Revisiting the prognostic value of
preoperative 18F-fluoro-2-deoxyglucose (18F-FDG) positron emission
tomography (PET) in early-stage (I & II) non-small cell lung cancers
European Journal of Nuclear Medicine and
691–698, DOI: 10.1007/s00259-009-1291-x
Lao, J. et al.A deep learning-based radiomics
model for prediction of survival in glioblastoma multiforme.
Scientific Reports7, Article No. 10353,
DOI: 10.1038/s41598-017-10649-8 (2017).
Ronneberger, O., Fischer, P. &
U-Net: Convolutional networks for biomedical image
In International Conference on Medical
Image Computing and Computer-Assisted Intervention (MICCAI),
Woodard, G. A., Jones, K. D. &
Jablons, D. M.
Lung Cancer Staging and Prognosis,
47–75 (Springer International
Publishing, Cham, 2016).
Chicklore, S. et al.Quantifying tumour heterogeneity in
18F-FDG PET/CT imaging by texture analysis.
European Journal of Nuclear Medicine and
133–140, DOI: 10.1007/s00259-012-2247-0
Lee, G. et al.Radiomics and its emerging role in
lung cancer research, imaging biomarkers and clinical management: State of
European Journal of Radiology86, 297 – 307,
Carvalho, S. et al.18F-fluorodeoxyglucose
positron-emission tomography (FDG-PET)-radiomics of metastatic lymph nodes
and primary tumor in non-small cell lung cancer (NSCLC) – a prospective
externally validated study.
1–16, DOI: 10.1371/journal.pone.0192859
Zhang, Y., Oikonomou, A.,
Wong, A., Haider, M. A. &
Radiomics-based prognosis analysis
for non-small cell lung cancer.
Scientific Reports7, Article number: 46349
Diamant, A., Avishek Chatterjee, M. V.,
Shenouda, G. & Seuntjens, J.
Deep learning in head & neck
cancer outcome prediction.
Scientific Reports9, Article No: 2764,
DOI: 10.1038/s41598-019-39206-1 (2019).
Wu, X., Zhong, Z., Buatti,
J. & Bai, J.
Multi-scale segmentation using deep graph cuts:
Robust lung tumor delineation in MVCBCT.
In 2018 IEEE 15th International Symposium
on Biomedical Imaging (ISBI 2018), 514–518
Zhong, Z. et al.Simultaneous cosegmentation of
tumors in PET‐CT images using deep fully convolutional networks.
Medical physics46, 619–633,
DOI: 10.1002/mp.13331 (2019).
Yosinski, J., Clune, J.,
Fuchs, T. & Lipson, H.
Understanding neural networks through deep
In ICML Workshop on Deep Learning
Selvaraju, R. R. et al.Grad-CAM: Visual explanations from deep networks
via gradient-based localization.