Detecting Pancreatic Adenocarcinoma in Multi-phase CT Scans via Alignment Ensemble

03/18/2020 ∙ by Yingda Xia, et al. ∙ 0

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers among population. Screening for PDACs in dynamic contrast-enhanced CT is beneficial for early diagnose. In this paper, we investigate the problem of automated detecting PDACs in multi-phase (arterial and venous) CT scans. Multiple phases provide more information than single phase, but they are unaligned and inhomogeneous in texture, making it difficult to combine cross-phase information seamlessly. We study multiple phase alignment strategies, i.e., early alignment (image registration), late alignment (high-level feature registration) and slow alignment (multi-level feature registration), and suggest an ensemble of all these alignments as a promising way to boost the performance of PDAC detection. We provide an extensive empirical evaluation on two PDAC datasets and show that the proposed alignment ensemble significantly outperforms previous state-of-the-art approaches, illustrating strong potential for clinical use.



There are no comments yet.


page 2

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the third most common cause of cancer death in the US with a dismal five-year survival of merely 9% [9]. Computed tomography (CT) is the most widely used imaging modality for the initial evaluation of suspected PDAC. However, due to the subtle early signs of PDACs in CTs, they are easily missed by even experienced radiologists.

Recently, automated PDAC detection in CT scans based on deep learning has received increasing attention 

[4, 3, 26, 22], which offers great opportunities in assisting radiologists to diagnosis early-stage PDACs. But, most of these methods only unitize one phase of CT scans, and thus fail to achieve satisfying results.

In this paper, we aim to develop a deep learning based PDAC detection system taking multiple phases, i.e., arterial and venous, of CT scans into account. This system consists of multiple encoders, each of which encodes information for one phase, and a segmentation decoder, which outputs PDAC detection results. Intuitively, multiple phases provide more information than a single phase, which certainly benefits PDAC detection. Nevertheless, how to combine this cross-phase information seamlessly is non-trivial. The challenges lie in two folds: 1) Tumor texture changes are subtle and appear differently across phases; 2) Image contents are not aligned across phases because of inevitable movements of patients during capturing multiple phases of CT scans. Consequently, a sophisticated phase alignment strategy is indispensable for detecting PDAC in multi-phase CT scans. An visual illustration is shown in Fig. 1.

Figure 1: Visual illustration of opportunity (top row) and challenge (bottom row) for PDAC detection in multi-phase CT scans (normal pancreas tissue - blue, pancreatic duct - green, PDAC mass - red). Top: tumor is barely visible in venous phase alone but more obvious in arterial phase. Bottom: there exist misalignment for images in these two phases given different organ size/shape and image contrast.

We investigate several alignment strategies to combine the information across multiple phases. (1) Early alignment: the alignment can be done in image space by performing image registration between multiple phases; (2) Late alignment: it can be done late in feature space by performing spatial transformation between the encoded high-level features of multiple phases; (3) Slow alignment: it can be also done step-wise in feature space by aggregating multi-level feature transformations between multiple phases. Based on an extensive empirical evaluation on two PDAC datasets [26, 22], we observe that 1) All alignment strategies are beneficial for PDAC detection, 2) alignments in feature space leads to better PDAC (tumor) segmentation performance than image registration, and (3) different alignment strategies are complementary to each other, i.e., an ensemble of them (Alignment Ensemble) significantly boosts the results, e.g., approximately 4% tumor DSC score improvements over our best alignment model.

Our contributions can be summarized as follows:

  • We provide extensive experimental evaluation of several phase alignment strategies for detecting PDAC in multi-phase CT scans.

  • We highlight an ensemble of early, late and slow alignments as a promising way to boost the performance of PDAC detection.

  • We validate our approach on two PDAC datasets [26, 22] and achieve state-of-the-art performances on both of them.

2 Related Work

Automated Pancreas and Pancreatic Tumor Segmentation With the recent advances of deep learning, automated pancreas segmentation has achieved tremendous improvements [16, 17, 2, 23, 21, 25, 20, 10], which is an essential prerequisite for pancreatic tumor detection. Meanwhile, researchers are pacing towards automated detection of pancreatic adenocarcinoma (PDAC), the most common type of pancreatic tumor (85%) [18]. Zhu et al. [26] investigated using deep networks to detect PDAC in CT scans but only segmented PDAC masses in venous phase. Zhou et al. [22] developed the a deep learning based approach for segmenting PDACs in multi-phase CT scans, i.e. arterial and venous phase. They used a traditional image registration [19] approach for pre-alignment and then applied a deep network that took both phases as input. Different to their method, we also investigate how to register multiple phases in feature space.

Multi-modal Image Registration and Segmentation Multi-modal image registration [14, 19, 6, 7]

is a fundamental task in medical image analysis. Recently, several deep learning based approaches, motivated by Spatial Transformer Networks 

[8], are proposed to address this task [1, 13, 24]. In terms of multi-modal segmentation, most of the previous works [11, 5, 22] perform segmentation on pre-registered multi-modal images. We also study these strategies for multi-modal segmentation, but we explore more, such as variants of end-to-end frameworks that jointly align multiple phases and segment target organs/tissues.

3 Methodology

3.1 Problem Statement

We aim at detecting PDACs from unaligned two-phase CT scans, i.e., the venous phase and the arterial phase. Following previous works [22, 26], venous phase is our fixed phase and arterial phase is the moving one. For each patient, we have an image and its corresponding label in the venous phase, as well as an arterial phase image without label. The whole dataset is denoted as , where , are 3D volumes representing the two-phase CT scans of the -th patient. is a voxel-wise annotated label map, which have the same three dimensional size as . Here, represents our segmentation targets, i.e., background, healthy pancreas tissue, pancreatic duct (crucial for PDAC clinical diagnoses) and PDAC mass, following previous literature [22, 26]. Our goal is to find a mapping function whose inputs and outputs are a pair of two-phase images and segmentation results , respectively: . The key problem here is how to align and , either in image space or feature space.

Figure 2: An illustration of (a) early alignment (image registration) (b) late alignment and (c) slow alignment. Right: feature alignment block.

3.2 Cross-phase Alignment and Segmentation

As shown in Fig 2, we propose and explore three types of alignment strategies, i.e., early alignment, late alignment and slow alignment, for accurate segmentation.

3.2.1 Early (image) alignment

Early alignment, or image alignment strategy is adopted in  [22] and some other multi-modal segmentation tasks such as BraTS challenge  [11], where multiple phases (modalities) are first aligned by image registration algorithms and then fed forward into deep networks for segmentation. Here, we utilize a well-known registration algorithm, DEEDS [7]

, to estimate the registration field

from an arterial image to its corresponding venous image . After registration, we use a network, consisting of two separtae encoders , and a decoder , to realize the mapping function :


where and

denote the concatenation of two tensors and the element-wise deformation operations on a tensor, respectively.

This strategy relies on the accuracy of image registration algorithms for information alignment. If such algorithms produce errors, especially possible on subtle texture changes of PDACs, these errors will propagate and there will be no way to rescue (since alignment is only done on image level). Also, it remains a question that how much performance gain a segmentation algorithm will achieve through this separate registration procedure.

3.2.2 Late alignment

An alternative way is late alignment, i.e., alignment in feature space. We first encode the pair of unaligned images with two phase-specific encoders , respectively. The encoded features of the two images, i.e., and , are presumablely in a shared feature space. We then use a network to estimate the deformable transformation field from arterial (moving) to venous (fixed) in the feature space by . We apply the estimated transformation field to feature map , then concatenate this transformed feature map to . The segmentation result is obtained by feeding the concatenation to a decoder :


We name such operation as “late alignment” since the alignment is performed at the last block of feature encoders.

3.2.3 Slow alignment

Late alignment performs one-off registration between two phases by only using high level features. However, it is known that the low level features of the deep network contain more image details, which motivates us to gradually align and propagate the features from multiple levels of the deep network. Following this spirit, we propose slow alignment, which leverages a stack of convolutional encoders and feature alignment blocks to iteratively align feature maps of two phases.

Let be an integer which is not less than 1 and (, ) are the fused (aligned to the venous phase) feature map and the arterial feature map outputted by the convolutional encoder, respectively. First, they are encoded by a pair of convolutional encoders (, ), respectively, which results in the venous feature map and the arterial feature map at the -th layer. Then a feature alignment block estimates a transformation field from the arterial (moving) phase to the venous (fixed) phase by


where is a small U-Net. We apply the transformation field to the arterial (moving) phase, resulting in transformed arterial feature map . Finally, the transformed arterial feature map is concatenated with the venous feature map , resulting in the fused feature map at the layer:


Let us rewrite the above process by a function : and define and , then we can iteratively derive the fused feature map at -th convolutional encoder:


where . The final fused feature map is fed to the decoder to compute the segmentation result :


3.2.4 Alignment Ensemble

We ensemble the three proposed alignment variants by simple majority voting of the predictions. The goal of the ensemble are in two folds, where the first is to improve overall performance and the second is to see whether these three alignment methods are complementary. Usually, an ensemble of complementary approaches can lead to large improvements.

4 Experiments and discussion

4.1 Dataset and evaluation

We evaluate our approach on two PDAC datasets, proposed in  [26] and  [22] respectively. For the ease of presentation, we regard the former as PDAC dataset 1 and the latter as PDAC dataset 2. PDAC dataset 1 contains 439 CT scans in total, in which 136 cases are diagnosed with PDAC and 303 cases are normal. Annotation contains voxel-wise labeled pancreas and PDAC mass. Evaluation is done by 4 fold cross-validation on these cases following [26]. PDAC dataset 2 contains 239 CT scans, all from PDAC patients, with pancreas, pancreatic duct (crucial for PDAC detection) and PDAC mass annotated. Evaluation are done by 3 fold cross-validation following [22].

All cases contain two phases: arterial phase and venous phase, with a spacing of 0.5mm in axial view and all annotations are verified by experienced board certified radiologists. The segmentation accuracy is evaluated using the Dice-Sørensen coefficient (DSC): , which has a range of with 1 implying a perfect prediction for each class. On dataset 1, we also evaluate classification accuracy by sensitivity and specificity following a “segmentation for classification” strategy proposed in [26].

4.2 Implementation details

We implemented our network with PyTorch. The CT scans are first truncated within a range of HU value [-100, 240] and normalized with zero mean and unit variance. In training stage, we randomly crop a patch size of

in roughly the same position from both arterial and venous phases. The optimization objective is Dice loss [12]. We use SGD optimizer with initial learning 0.005 and a cosine learning rate schedule for 40k iterations. For all our experiments, we implement the encoder and decoder architecture as U-Net [15] with 4 downsampling layers, making a total alignments of in Eq 6. The transformation fields are estimated by light-weighted U-Nets in late align and slow align. The image registration algorithm for our early alignment is DEEDS [7].

Method N.Pancreas A.Pancreas Tumor Misses Sens. Spec.
U-Net [15] 86.98.6 81.010.8 57.328.1 10/136 92.7 99.0
V-Net [12] 87.08.4 81.610.2 57.627.8 11/136 91.9 99.0
MS C2F [26] 84.5 11.1 78.6 13.3 56.5 27.2 8/136 94.1 98.5
Baseline - NA 85.88.0 79.511.2 58.427.4 11/136 91.9 96.0
Ours - EA 86.79.7 81.810.0 60.926.5 4/136 97.1 94.5
Ours - LA 87.57.6 82.010.3 62.027.0 7/136 94.9 96.0
Ours - SA 87.07.8 82.89.4 60.427.4 4/136 97.1 96.5
Ours - Ensemble 87.67.8 83.38.2 64.425.6 4/136 97.1 96.0
Table 1: Results on PDAC dataset 1 with both healthy and pathological cases. We compare our variants of alignment methods with the state-of-the-art method [26] as well as our baseline - no align (NA) version. “Misses” represents the number of cases failed in tumor detection. We also report healthy vs. pathological case classification (sensitivity and specificity) based on segmentation results. EA/LA/SA are short for early align/late align/slow align. The last row is the ensemble of the three alignments.
Method A.Pancreas Tumor Panc. duct Misses
U-Net [15] 79.6110.47 53.0827.06 40.2527.89 11/239
ResDSN [25] 84.927.70 56.8626.67 49.8126.23 11/239
HPN-U-Net [22] 82.459.98 54.3626.34 43.2726.33 -/239
HPN-ResDSN [22] 85.798.86 60.8724.95 54.1824.74 7/239
Ours - EA 83.659.22 60.8722.15 55.3829.47 5/239
Ours - LA 86.826.13 62.0224.53 64.3529.94 9/239
Ours - SA 87.135.85 61.2424.26 64.1929.46 8/239
Ours - Ensemble 87.375.67 64.1421.16 64.3829.67 6/239
Table 2: Results on PDAC dataset 2 with pathological cases only. We compare our variants of alignment methods with the state-of-the-art method [22]. “Misses” represents the number of cases failed in tumor detection. EA/LA/SA are short for early align/late align/slow align. The last row is the ensemble of the three alignments.

4.3 Results

Results on dataset 1 and 2 are summarized in Table 1 and Table 2 respectively, where our approach achieves the state-of-the-art performance on both datasets. Based on the results, we have three observations which leads to three findings.

Dual-phase alignments are beneficial for detecting PDACs in multi-phase CT scans. On both datasets, our approaches, i.e. early align (EA), late align (LA) and slow align (AA), outperform single phase algorithms, i.e. U-Net [15], V-Net [12], ResDSN [25] and MS C2F [26], as well as our non-alignment dual-phase version (Baseline-NA).

Feature space alignments have larger improvements on segmentation performances than early alignments. Generally speaking for both datasets, our feature space alignment models (LA, SA) outperform image registration based approaches, i.e. HPN, Ours-EA, in terms of segmentation performance. Since early alignment methods apply image registration in advance, they do not guarantee a final improvement on segmentation performance. In contrast, feature space alignment methods jointly align and segment the targets in an end-to-end fashion by optimizing the final segmentation objective function, which leads to a larger improvements compared with single phase or naive dual phase methods without alignment. However, we indeed observe that early alignment leads to relatively less false negatives (misses).

An ensemble of the three alignment strategies significantly improve the performances. For both dataset, Ours-Ensemble achieves the best performances, illustrating that the three alignment strategies are complementary to each other. An ensemble leads to significant performance gain (relatively 4% improvements on tumor segmentation DSC score compared to the best alignment model from 62.0% to 64.4%) and achieves the state-of-the-art performances on both datasets. A qualitative analysis is also shown in Fig 3.

Last but not least, our alignment approaches also improve the sensitivity of healthy vs. pathological classification. In dataset 1, we adopt the same “segmentation for classification” strategy as in [22]

, which classifies a case as pathological if we are able to detect any tumor mass larger than 50 voxels. Our approach can improve the overall sensitivity from 94.1% to 97.1% by reducing misses from 8 to 4, which is beneficial for the early detection of PDAC. Our approach thus has valuable potential of winning precious time for early treatments for patients.

Figure 3: An example of PDAC dataset 1 on venous phase. From left to right, we display ground-truth, prediction of our baseline without alignment, prediction of our early align, late align, slow align and alignment ensemble. Our feature space alignments (LA, SA) outperform no-align baseline and image registration (EA). Ensemble of the three alignment predictions also improves tumor segmentation DSC score.

5 Conclusion

In this paper, we study three types of alignment approaches for detecting pancreatic adenocarcinoma (PDACs) in multi-phase CT scans. Early alignment first applies registration in image space and then segment with a deep network. Late alignment and slow alignment jointly align and segment with an end-to-end deep network. The former aligns in the final encoded feature space while the latter aligns multi-stage features and propagate slowly. An ensemble of the three approaches improve the performances significantly illustrating these alignment variants are complementary to each other. We achieve the state-of-the-art performances on two PDAC datasets.


  • [1] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE transactions on medical imaging. Cited by: §2.
  • [2] J. Cai, L. Lu, Z. Zhang, F. Xing, L. Yang, and Q. Yin (2016)

    Pancreas segmentation in mri using graph-based decision fusion on convolutional neural networks

    In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 442–450. Cited by: §2.
  • [3] L. C. Chu, S. Park, S. Kawamoto, D. F. Fouladi, S. Shayesteh, E. S. Zinreich, J. S. Graves, K. M. Horton, R. H. Hruban, A. L. Yuille, et al. (2019) Utility of ct radiomics features in differentiation of pancreatic ductal adenocarcinoma from normal pancreatic tissue. American Journal of Roentgenology 213 (2), pp. 349–357. Cited by: §1.
  • [4] L. C. Chu, S. Park, S. Kawamoto, Y. Wang, Y. Zhou, W. Shen, Z. Zhu, Y. Xia, L. Xie, F. Liu, et al. (2019) Application of deep learning to pancreatic cancer detection: lessons learned from our initial experience. Journal of the American College of Radiology 16 (9), pp. 1338–1342. Cited by: §1.
  • [5] J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, and I. B. Ayed (2018) HyperDense-net: a hyper-densely connected cnn for multi-modal image segmentation. IEEE transactions on medical imaging 38 (5), pp. 1116–1126. Cited by: §2.
  • [6] T. Gaens, F. Maes, D. Vandermeulen, and P. Suetens (1998) Non-rigid multimodal image registration using mutual information. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 1099–1106. Cited by: §2.
  • [7] M. P. Heinrich, M. Jenkinson, M. Brady, and J. A. Schnabel (2013) MRF-based deformable registration and ventilation estimation of lung ct. IEEE transactions on medical imaging 32 (7), pp. 1239–1248. Cited by: §2, §3.2.1, §4.2.
  • [8] M. Jaderberg, K. Simonyan, A. Zisserman, et al. (2015) Spatial transformer networks. In Advances in neural information processing systems, pp. 2017–2025. Cited by: §2.
  • [9] A. L. Lucas and F. Kastrinos (2019) Screening for pancreatic cancer. Jama 322 (5), pp. 407–408. Cited by: §1.
  • [10] Y. Man, Y. Huang, J. Feng, X. Li, and F. Wu (2019) Deep q learning driven ct pancreas segmentation with geometry-aware u-net. IEEE transactions on medical imaging 38 (8), pp. 1971–1980. Cited by: §2.
  • [11] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §2, §3.2.1.
  • [12] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §4.2, §4.3, Table 1.
  • [13] C. Qin, B. Shi, R. Liao, T. Mansi, D. Rueckert, and A. Kamen (2019) Unsupervised deformable registration for multi-modal images via disentangled representations. In International Conference on Information Processing in Medical Imaging, pp. 249–261. Cited by: §2.
  • [14] A. Roche, G. Malandain, X. Pennec, and N. Ayache (1998) The correlation ratio as a new similarity measure for multimodal image registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 1115–1124. Cited by: §2.
  • [15] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §4.2, §4.3, Table 1, Table 2.
  • [16] H. R. Roth, L. Lu, A. Farag, H. Shin, J. Liu, E. B. Turkbey, and R. M. Summers (2015) Deeporgan: multi-level deep convolutional networks for automated pancreas segmentation. In International conference on medical image computing and computer-assisted intervention, pp. 556–564. Cited by: §2.
  • [17] H. R. Roth, L. Lu, A. Farag, A. Sohn, and R. M. Summers (2016) Spatial aggregation of holistically-nested networks for automated pancreas segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 451–459. Cited by: §2.
  • [18] D. P. Ryan, T. S. Hong, and N. Bardeesy (2014) Pancreatic adenocarcinoma. New England Journal of Medicine 371 (11), pp. 1039–1049. Cited by: §2.
  • [19] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache (2009) Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45 (1), pp. S61–S72. Cited by: §2, §2.
  • [20] Y. Xia, L. Xie, F. Liu, Z. Zhu, E. K. Fishman, and A. L. Yuille (2018) Bridging the gap between 2d and 3d organ segmentation with volumetric fusion net. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 445–453. Cited by: §2.
  • [21] Q. Yu, L. Xie, Y. Wang, Y. Zhou, E. K. Fishman, and A. L. Yuille (2018) Recurrent saliency transformation network: incorporating multi-stage visual cues for small organ segmentation. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 8280–8289. Cited by: §2.
  • [22] Y. Zhou, Y. Li, Z. Zhang, Y. Wang, A. Wang, E. K. Fishman, A. L. Yuille, and S. Park (2019) Hyper-pairing network for multi-phase pancreatic ductal adenocarcinoma segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 155–163. Cited by: 3rd item, §1, §1, §2, §2, §3.1, §3.2.1, §4.1, §4.3, Table 2.
  • [23] Y. Zhou, L. Xie, W. Shen, Y. Wang, E. K. Fishman, and A. L. Yuille (2017) A fixed-point model for pancreas segmentation in abdominal ct scans. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 693–701. Cited by: §2.
  • [24] W. Zhu, A. Myronenko, Z. Xu, W. Li, H. Roth, Y. Huang, F. Milletari, and D. Xu (2020) NeurReg: neural registration and its application to image segmentation. In The IEEE Winter Conference on Applications of Computer Vision, pp. 3617–3626. Cited by: §2.
  • [25] Z. Zhu, Y. Xia, W. Shen, E. Fishman, and A. Yuille (2018) A 3d coarse-to-fine framework for volumetric medical image segmentation. In 2018 International Conference on 3D Vision (3DV), pp. 682–690. Cited by: §2, §4.3, Table 2.
  • [26] Z. Zhu, Y. Xia, L. Xie, E. K. Fishman, and A. L. Yuille (2019) Multi-scale coarse-to-fine segmentation for screening pancreatic ductal adenocarcinoma. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 3–12. Cited by: 3rd item, §1, §1, §2, §3.1, §4.1, §4.1, §4.3, Table 1.