Hepatocellular Carcinoma Intra-arterial Treatment Response Prediction for Improved Therapeutic Decision-Making

12/01/2019 ∙ by Junlin Yang, et al. ∙ 0

This work proposes a pipeline to predict treatment response to intra-arterial therapy of patients with Hepatocellular Carcinoma (HCC) for improved therapeutic decision-making. Our graph neural network model seamlessly combines heterogeneous inputs of baseline MR scans, pre-treatment clinical information, and planned treatment characteristics and has been validated on patients with HCC treated by transarterial chemoembolization (TACE). It achieves Accuracy of 0.713 ± 0.075, F1 of 0.702 ± 0.082 and AUC of 0.710 ± 0.108. In addition, the pipeline incorporates uncertainty estimation to select hard cases and most align with the misclassified cases. The proposed pipeline arrives at more informed intra-arterial therapeutic decisions for patients with HCC via improving model accuracy and incorporating uncertainty estimation.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hepatocellular carcinoma (HCC), primary liver cancer, has the fastest rising incidence rates worldwide, especially in the western countries Siegel et al. (2019). Transarterial chemoembolization (TACE) has been a well established primary therapy for patients with unresectable HCC Lencioni et al. (2013). The assessment of patients after TACE treatment has been advanced by the quantitative European Association for the Study of the Liver (qEASL) response criterion, which quantitatively measures the degree of change in 3D enhancing tumor volume instead of 2D measurement or visual estimation. However, it still remains a clinical challenge to predict which patients will respond to TACE before treatment Mannelli et al. (2013).

Why does early prediction before treatment matter? Data-driven methods for medical imaging have been emphasized more on the analytical models such as segmentation and classification for diagnostic purposes compared to predictive models Zhang et al. (2018). However, the advancement in predictive models to predict future medical observations, such as disease progression Akbari et al. (2016), survival and prognosis Macyszyn et al. (2015), and treatment response Mannelli et al. (2013), with high precision could impact the development of treatment procedures and could modify treatment strategy. Here, we focus on treatment response prediction of the TACE procedure. Predicting and identifying non-responders to TACE prior to initiation of therapy carries significant potential survival benefits for non-responders, should such patients be allowed to enter alternative, e.g. systemic, therapies.

Previous work on treatment response relies heavily on clinical features and handcrafted radiomics features Mannelli et al. (2013)Abajian et al. (2018)

. Recently, more and more convolutional neural networks that take images as inputs have been utilized for treatment response prediction

Shi et al. (2019)Wu et al. (2019)Peng et al. (2019). However, integration of readily-available non-imaging data, such as clinical information and treatment characteristics, would likely improve prediction accuracy. This paper leverages recent advances in graph neural networks, which has shown the power of handling heterogeneous inputs. In medical imaging, it has been widely used for brain data analysis, e.g., to handle imaging and non-imaging data Parisot et al. (2017) and to identify biomarkers from complicated relationships Li et al. (2019).

This paper presents the first work to explore graph neural networks for prediction of treatment response. Some level of uncertainty exists not only in the model and data, but also in the ground truth labels (as illustrated in Sec. 2). The pipeline incorporates uncertainty estimation to select difficult cases that are often misclassified.

2 Method

Problem formulation Pre-treatment baseline data from HCC patients are collected. We aim to predict TACE treatment response (one month follow-up) from baseline data. Since treatment response can be assessed by changes in qEASL value Mannelli et al. (2013) and over a 65% reduction in qEASL between baseline and follow-up imaging indicates responders, the prediction problem can be conceptualized as a classification problem.

Data The dataset consists of 83 patients with HCC treated by TACE. Both baseline and one month follow-up multi-phasic MR scans are collected. Non-imaging data includes pre-treatment clinical information (laboratory values, clinical history, etc) and planned treatment characteristics. As mentioned above, 65% drop in qEASL indicates responders. To generate ground truth labels, qEASL analysis was performed on both baseline MR and one month follow-up MR scans and changes were computed accordingly. 20-second arterial phase images were selected for qEASL analysis, as shown in Fig. 2. qEASL values are essentially the enhancing tumor volume expressed as a percentage of the total tumor volume. To estimate qEASL, each measurement includes three parenchymal regions of interest (ROIs) to generate an average, serving as the estimated parenchymal intensity.

Figure 1: Two examples of qEASL analysis, left is from responder, right is from non-responder. For each example, left is baseline scan, right is follow-up scan. We can see the qEASL value drops from 40.17 to 2.94 cm for responder, while the qEASL value drops from 246.12 to 424.86 cm for non-responder. Similar estimation is performed three times for each patient and averaged to generate the final ground truth label.
Figure 2:

Generating node feature vectors from 3D volumes using autoencoder.

Pipeline The proposed prediction pipeline consists of three steps, as shown in Fig. 3. First, build the graph using both imaging and non-imaging data, where each patient serves as a node. Second, train the graph convolutional neural network (GCN) with softmax for semi-supervised classification on the above graph to get prediction results. Third, use Monte Carlo (MC) dropout as Bayesian estimation for uncertainty estimation to identify hard cases for more informed decision-making.

To build the graph, node feature vectors encoding imaging information are generated by a 3D autoencoder (AE) model as shown in Fig. 2. The AE model is fed concatenated liver and tumor 3D volumes and trained on the self-reconstruction task. The AE latent vectors of length of 128 are extracted as node feature vectors. Graph edges incorporate the non-imaging data. According to the prior knowledge from physicians, two binary features from both clinical information (Cirrhosis presence) and treatment characteristics (Sorafenib) are selected. An edge is drawn for each binary feature whose status is shared between two patients to form the adjacency matrix. Correlations between each pair of nodes are computed to be applied as weights on the above adjacency matrix.

The graph convolutional neural network (GCN) Kipf and Welling (2016)

was trained on the graph built above. The structure of the GCN consists of convolutional layers, Rectified Linear Units (ReLU), and a softmax activation function at the end. To avoid over-fitting and realize uncertainty estimation, a dropout rate of 0.15% was applied during both training and testing. To train the prediction model, the cross-entropy loss function was calculated only over labelled training nodes during training stage, and then used for updating the parameters in the GCN. During testing stage, unlabelled testing nodes are assigned labels according to the output of the softmax.

Figure 3: Pipeline for predicting treatment response using GCN and uncertainty estimation.

To generate the final prediction and corresponding uncertainty estimation, MC dropout Gal and Ghahramani (2016) was utilized. For each sample, predictions were performed 100 times using the GCN model with dropout. The final prediction was decided by majority voting. Confidence of the prediction was estimated quantitatively by the ratio of predictions that agreed with the final prediction. The confidence level for each prediction can be used to select the most uncertain cases, as they are likely to be the difficult cases that are more likely to be misclassified.

3 Results and Analysis

Classification results 10-fold cross-validation was applied for evaluation. Please refer to Table 1

for details. GCN is the graph convolutional neural network model in the above proposed pipeline. RF is a random forest model with the same imaging and binary non-imaging features as inputs where PCA was used for dimensional reduction. Ablation 1-3 w/o Cirrhosis/Sorafenib/non-imaging refers to building the graph without Cirrhosis/Sorafenib/both non-imaging features. GCN shows a significant improvement in prediction performance compared to the random forest model. The ablation studies show that the proper construction of the graph with prior knowledge is essential for the success of GCN. The drop in performance for each ablation study corresponds to the importance of the dropped non-imaging feature.

Method Accuracy (std) F1 (std) AUC (std)
Ablation 1 w/o Cirrhosis
Ablation 2 w/o Sorafenib
Ablation 3 w/o non-imaging
Table 1: Comparison of prediction performance

Uncertainty estimation Uncertainty estimation Gal and Ghahramani (2016) was achieved by MC dropout during the test stage. By ruling out test cases with the lowest confidence, we can see the classification performance generally improves, which shows that the majority of low confidence cases align with misclassified cases. Specifically, when ruling out cases with confidence lower than 85%, 90%, 95%, computed on remaining cases, F1 improves by 3.76%, 3.6%, 5.76%, AUC improves by 2.11%, 3.56%, 8.61%, and Accuracy improves by 5.45%, 6.58%, 10.61%.

4 Conclusion

In summary, the proposed pipeline arrives at more informed intra-arterial therapeutic decisions for HCC patients via improving model accuracy and incorporating uncertainty estimation. GCN incorporates prior knowledge into the graph construction and combines both imaging and non-imaging features. Uncertainty estimation serves as an essential role towards more informed clinical decision-making. Yet, much remains to be improved. For future research, more flexible graph construction such as constructing multiple graphs instead of one graph could help incorporate more prior information. Other uncertainty estimation methods should also be investigated.


  • [1] A. Abajian, N. Murali, L. J. Savic, F. M. Laage-Gaupp, N. Nezami, J. S. Duncan, T. Schlachter, M. Lin, J. Geschwind, and J. Chapiro (2018)

    Predicting treatment response to intra-arterial therapies for hepatocellular carcinoma with the use of supervised machine learning—an artificial intelligence concept

    Journal of Vascular and Interventional Radiology 29 (6), pp. 850–857. Cited by: §1.
  • [2] H. Akbari, L. Macyszyn, X. Da, M. Bilello, R. L. Wolf, M. Martinez-Lage, G. Biros, M. Alonso-Basanta, D. M. O’Rourke, and C. Davatzikos (2016) Imaging surrogates of infiltration obtained via multiparametric imaging pattern analysis predict subsequent location of recurrence of glioblastoma. Neurosurgery 78 (4), pp. 572–580. Cited by: §1.
  • [3] Y. Gal and Z. Ghahramani (2016)

    Dropout as a bayesian approximation: representing model uncertainty in deep learning

    In international conference on machine learning, pp. 1050–1059. Cited by: §2, §3.
  • [4] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
  • [5] R. Lencioni, P. Petruzzi, and L. Crocetti (2013) Chemoembolization of hepatocellular carcinoma. In Seminars in interventional radiology, Vol. 30, pp. 003–011. Cited by: §1.
  • [6] X. Li, N. C. Dvornek, Y. Zhou, J. Zhuang, P. Ventola, and J. S. Duncan (2019) Graph neural network for interpreting task-fmri biomarkers. arXiv preprint arXiv:1907.01661. Cited by: §1.
  • [7] L. Macyszyn, H. Akbari, J. M. Pisapia, X. Da, M. Attiah, V. Pigrish, Y. Bi, S. Pal, R. V. Davuluri, L. Roccograndi, et al. (2015) Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-oncology 18 (3), pp. 417–425. Cited by: §1.
  • [8] L. Mannelli, S. Kim, C. H. Hajdu, J. S. Babb, and B. Taouli (2013) Serial diffusion-weighted mri in patients with hepatocellular carcinoma: prediction and assessment of response to transarterial chemoembolization. preliminary experience. European journal of radiology 82 (4), pp. 577–582. Cited by: §1, §1, §1, §2.
  • [9] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, R. G. Moreno, B. Glocker, and D. Rueckert (2017) Spectral graph convolutions for population-based disease prediction. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 177–185. Cited by: §1.
  • [10] J. Peng, S. Kang, Z. Ning, H. Deng, J. Shen, Y. Xu, J. Zhang, W. Zhao, X. Li, W. Gong, J. Huang, and L. Liu (2019-07-22) Residual convolutional neural network for predicting response of transarterial chemoembolization in hepatocellular carcinoma from ct imaging. European Radiology. External Links: ISSN 1432-1084, Document, Link Cited by: §1.
  • [11] L. Shi, Y. Zhang, K. Nie, X. Sun, T. Niu, N. Yue, T. Kwong, P. Chang, D. Chow, J. Chen, et al. (2019) Machine learning for prediction of chemoradiation therapy response in rectal cancer using pre-treatment and mid-radiation multi-parametric mri. Magnetic resonance imaging 61, pp. 33–40. Cited by: §1.
  • [12] R. L. Siegel, K. D. Miller, and A. Jemal (2019) Cancer statistics, 2019. CA: a cancer journal for clinicians 69 (1), pp. 7–34. Cited by: §1.
  • [13] E. Wu, L. M. Hadjiiski, R. K. Samala, H. Chan, K. H. Cha, C. Richter, R. H. Cohan, E. M. Caoili, C. Paramagul, A. Alva, et al. (2019) Deep learning approach for assessment of bladder cancer treatment response. Tomography 5 (1), pp. 201. Cited by: §1.
  • [14] F. Zhang, J. Yang, N. Nezami, F. Laage-gaupp, J. Chapiro, M. De Lin, and J. Duncan (2018) Liver tissue classification using an auto-context-based deep neural network with a multi-phase training framework. In International Workshop on Patch-based Techniques in Medical Imaging, pp. 59–66. Cited by: §1.