Pancreatic ductal adenocarcinoma (PDAC) is the third most common cause of cancer death in the US with a dismal five-year survival of merely 9% . Computed tomography (CT) is the most widely used imaging modality for the initial evaluation of suspected PDAC. However, due to the subtle early signs of PDACs in CTs, they are easily missed by even experienced radiologists.
Recently, automated PDAC detection in CT scans based on deep learning has received increasing attention[4, 3, 26, 22], which offers great opportunities in assisting radiologists to diagnosis early-stage PDACs. But, most of these methods only unitize one phase of CT scans, and thus fail to achieve satisfying results.
In this paper, we aim to develop a deep learning based PDAC detection system taking multiple phases, i.e., arterial and venous, of CT scans into account. This system consists of multiple encoders, each of which encodes information for one phase, and a segmentation decoder, which outputs PDAC detection results. Intuitively, multiple phases provide more information than a single phase, which certainly benefits PDAC detection. Nevertheless, how to combine this cross-phase information seamlessly is non-trivial. The challenges lie in two folds: 1) Tumor texture changes are subtle and appear differently across phases; 2) Image contents are not aligned across phases because of inevitable movements of patients during capturing multiple phases of CT scans. Consequently, a sophisticated phase alignment strategy is indispensable for detecting PDAC in multi-phase CT scans. An visual illustration is shown in Fig. 1.
We investigate several alignment strategies to combine the information across multiple phases. (1) Early alignment: the alignment can be done in image space by performing image registration between multiple phases; (2) Late alignment: it can be done late in feature space by performing spatial transformation between the encoded high-level features of multiple phases; (3) Slow alignment: it can be also done step-wise in feature space by aggregating multi-level feature transformations between multiple phases. Based on an extensive empirical evaluation on two PDAC datasets [26, 22], we observe that 1) All alignment strategies are beneficial for PDAC detection, 2) alignments in feature space leads to better PDAC (tumor) segmentation performance than image registration, and (3) different alignment strategies are complementary to each other, i.e., an ensemble of them (Alignment Ensemble) significantly boosts the results, e.g., approximately 4% tumor DSC score improvements over our best alignment model.
Our contributions can be summarized as follows:
We provide extensive experimental evaluation of several phase alignment strategies for detecting PDAC in multi-phase CT scans.
We highlight an ensemble of early, late and slow alignments as a promising way to boost the performance of PDAC detection.
2 Related Work
Automated Pancreas and Pancreatic Tumor Segmentation With the recent advances of deep learning, automated pancreas segmentation has achieved tremendous improvements [16, 17, 2, 23, 21, 25, 20, 10], which is an essential prerequisite for pancreatic tumor detection. Meanwhile, researchers are pacing towards automated detection of pancreatic adenocarcinoma (PDAC), the most common type of pancreatic tumor (85%) . Zhu et al.  investigated using deep networks to detect PDAC in CT scans but only segmented PDAC masses in venous phase. Zhou et al.  developed the a deep learning based approach for segmenting PDACs in multi-phase CT scans, i.e. arterial and venous phase. They used a traditional image registration  approach for pre-alignment and then applied a deep network that took both phases as input. Different to their method, we also investigate how to register multiple phases in feature space.
is a fundamental task in medical image analysis. Recently, several deep learning based approaches, motivated by Spatial Transformer Networks, are proposed to address this task [1, 13, 24]. In terms of multi-modal segmentation, most of the previous works [11, 5, 22] perform segmentation on pre-registered multi-modal images. We also study these strategies for multi-modal segmentation, but we explore more, such as variants of end-to-end frameworks that jointly align multiple phases and segment target organs/tissues.
3.1 Problem Statement
We aim at detecting PDACs from unaligned two-phase CT scans, i.e., the venous phase and the arterial phase. Following previous works [22, 26], venous phase is our fixed phase and arterial phase is the moving one. For each patient, we have an image and its corresponding label in the venous phase, as well as an arterial phase image without label. The whole dataset is denoted as , where , are 3D volumes representing the two-phase CT scans of the -th patient. is a voxel-wise annotated label map, which have the same three dimensional size as . Here, represents our segmentation targets, i.e., background, healthy pancreas tissue, pancreatic duct (crucial for PDAC clinical diagnoses) and PDAC mass, following previous literature [22, 26]. Our goal is to find a mapping function whose inputs and outputs are a pair of two-phase images and segmentation results , respectively: . The key problem here is how to align and , either in image space or feature space.
3.2 Cross-phase Alignment and Segmentation
As shown in Fig 2, we propose and explore three types of alignment strategies, i.e., early alignment, late alignment and slow alignment, for accurate segmentation.
3.2.1 Early (image) alignment
Early alignment, or image alignment strategy is adopted in  and some other multi-modal segmentation tasks such as BraTS challenge , where multiple phases (modalities) are first aligned by image registration algorithms and then fed forward into deep networks for segmentation. Here, we utilize a well-known registration algorithm, DEEDS 
, to estimate the registration fieldfrom an arterial image to its corresponding venous image . After registration, we use a network, consisting of two separtae encoders , and a decoder , to realize the mapping function :
denote the concatenation of two tensors and the element-wise deformation operations on a tensor, respectively.
This strategy relies on the accuracy of image registration algorithms for information alignment. If such algorithms produce errors, especially possible on subtle texture changes of PDACs, these errors will propagate and there will be no way to rescue (since alignment is only done on image level). Also, it remains a question that how much performance gain a segmentation algorithm will achieve through this separate registration procedure.
3.2.2 Late alignment
An alternative way is late alignment, i.e., alignment in feature space. We first encode the pair of unaligned images with two phase-specific encoders , respectively. The encoded features of the two images, i.e., and , are presumablely in a shared feature space. We then use a network to estimate the deformable transformation field from arterial (moving) to venous (fixed) in the feature space by . We apply the estimated transformation field to feature map , then concatenate this transformed feature map to . The segmentation result is obtained by feeding the concatenation to a decoder :
We name such operation as “late alignment” since the alignment is performed at the last block of feature encoders.
3.2.3 Slow alignment
Late alignment performs one-off registration between two phases by only using high level features. However, it is known that the low level features of the deep network contain more image details, which motivates us to gradually align and propagate the features from multiple levels of the deep network. Following this spirit, we propose slow alignment, which leverages a stack of convolutional encoders and feature alignment blocks to iteratively align feature maps of two phases.
Let be an integer which is not less than 1 and (, ) are the fused (aligned to the venous phase) feature map and the arterial feature map outputted by the convolutional encoder, respectively. First, they are encoded by a pair of convolutional encoders (, ), respectively, which results in the venous feature map and the arterial feature map at the -th layer. Then a feature alignment block estimates a transformation field from the arterial (moving) phase to the venous (fixed) phase by
where is a small U-Net. We apply the transformation field to the arterial (moving) phase, resulting in transformed arterial feature map . Finally, the transformed arterial feature map is concatenated with the venous feature map , resulting in the fused feature map at the layer:
Let us rewrite the above process by a function : and define and , then we can iteratively derive the fused feature map at -th convolutional encoder:
where . The final fused feature map is fed to the decoder to compute the segmentation result :
3.2.4 Alignment Ensemble
We ensemble the three proposed alignment variants by simple majority voting of the predictions. The goal of the ensemble are in two folds, where the first is to improve overall performance and the second is to see whether these three alignment methods are complementary. Usually, an ensemble of complementary approaches can lead to large improvements.
4 Experiments and discussion
4.1 Dataset and evaluation
We evaluate our approach on two PDAC datasets, proposed in  and  respectively. For the ease of presentation, we regard the former as PDAC dataset 1 and the latter as PDAC dataset 2. PDAC dataset 1 contains 439 CT scans in total, in which 136 cases are diagnosed with PDAC and 303 cases are normal. Annotation contains voxel-wise labeled pancreas and PDAC mass. Evaluation is done by 4 fold cross-validation on these cases following . PDAC dataset 2 contains 239 CT scans, all from PDAC patients, with pancreas, pancreatic duct (crucial for PDAC detection) and PDAC mass annotated. Evaluation are done by 3 fold cross-validation following .
All cases contain two phases: arterial phase and venous phase, with a spacing of 0.5mm in axial view and all annotations are verified by experienced board certified radiologists. The segmentation accuracy is evaluated using the Dice-Sørensen coefficient (DSC): , which has a range of with 1 implying a perfect prediction for each class. On dataset 1, we also evaluate classification accuracy by sensitivity and specificity following a “segmentation for classification” strategy proposed in .
4.2 Implementation details
We implemented our network with PyTorch. The CT scans are first truncated within a range of HU value [-100, 240] and normalized with zero mean and unit variance. In training stage, we randomly crop a patch size ofin roughly the same position from both arterial and venous phases. The optimization objective is Dice loss . We use SGD optimizer with initial learning 0.005 and a cosine learning rate schedule for 40k iterations. For all our experiments, we implement the encoder and decoder architecture as U-Net  with 4 downsampling layers, making a total alignments of in Eq 6. The transformation fields are estimated by light-weighted U-Nets in late align and slow align. The image registration algorithm for our early alignment is DEEDS .
|MS C2F ||84.5 11.1||78.6 13.3||56.5 27.2||8/136||94.1||98.5|
|Baseline - NA||85.88.0||79.511.2||58.427.4||11/136||91.9||96.0|
|Ours - EA||86.79.7||81.810.0||60.926.5||4/136||97.1||94.5|
|Ours - LA||87.57.6||82.010.3||62.027.0||7/136||94.9||96.0|
|Ours - SA||87.07.8||82.89.4||60.427.4||4/136||97.1||96.5|
|Ours - Ensemble||87.67.8||83.38.2||64.425.6||4/136||97.1||96.0|
|Ours - EA||83.659.22||60.8722.15||55.3829.47||5/239|
|Ours - LA||86.826.13||62.0224.53||64.3529.94||9/239|
|Ours - SA||87.135.85||61.2424.26||64.1929.46||8/239|
|Ours - Ensemble||87.375.67||64.1421.16||64.3829.67||6/239|
Results on dataset 1 and 2 are summarized in Table 1 and Table 2 respectively, where our approach achieves the state-of-the-art performance on both datasets. Based on the results, we have three observations which leads to three findings.
Dual-phase alignments are beneficial for detecting PDACs in multi-phase CT scans. On both datasets, our approaches, i.e. early align (EA), late align (LA) and slow align (AA), outperform single phase algorithms, i.e. U-Net , V-Net , ResDSN  and MS C2F , as well as our non-alignment dual-phase version (Baseline-NA).
Feature space alignments have larger improvements on segmentation performances than early alignments. Generally speaking for both datasets, our feature space alignment models (LA, SA) outperform image registration based approaches, i.e. HPN, Ours-EA, in terms of segmentation performance. Since early alignment methods apply image registration in advance, they do not guarantee a final improvement on segmentation performance. In contrast, feature space alignment methods jointly align and segment the targets in an end-to-end fashion by optimizing the final segmentation objective function, which leads to a larger improvements compared with single phase or naive dual phase methods without alignment. However, we indeed observe that early alignment leads to relatively less false negatives (misses).
An ensemble of the three alignment strategies significantly improve the performances. For both dataset, Ours-Ensemble achieves the best performances, illustrating that the three alignment strategies are complementary to each other. An ensemble leads to significant performance gain (relatively 4% improvements on tumor segmentation DSC score compared to the best alignment model from 62.0% to 64.4%) and achieves the state-of-the-art performances on both datasets. A qualitative analysis is also shown in Fig 3.
Last but not least, our alignment approaches also improve the sensitivity of healthy vs. pathological classification. In dataset 1, we adopt the same “segmentation for classification” strategy as in 
, which classifies a case as pathological if we are able to detect any tumor mass larger than 50 voxels. Our approach can improve the overall sensitivity from 94.1% to 97.1% by reducing misses from 8 to 4, which is beneficial for the early detection of PDAC. Our approach thus has valuable potential of winning precious time for early treatments for patients.
In this paper, we study three types of alignment approaches for detecting pancreatic adenocarcinoma (PDACs) in multi-phase CT scans. Early alignment first applies registration in image space and then segment with a deep network. Late alignment and slow alignment jointly align and segment with an end-to-end deep network. The former aligns in the final encoded feature space while the latter aligns multi-stage features and propagate slowly. An ensemble of the three approaches improve the performances significantly illustrating these alignment variants are complementary to each other. We achieve the state-of-the-art performances on two PDAC datasets.
-  (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE transactions on medical imaging. Cited by: §2.
Pancreas segmentation in mri using graph-based decision fusion on convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 442–450. Cited by: §2.
-  (2019) Utility of ct radiomics features in differentiation of pancreatic ductal adenocarcinoma from normal pancreatic tissue. American Journal of Roentgenology 213 (2), pp. 349–357. Cited by: §1.
-  (2019) Application of deep learning to pancreatic cancer detection: lessons learned from our initial experience. Journal of the American College of Radiology 16 (9), pp. 1338–1342. Cited by: §1.
-  (2018) HyperDense-net: a hyper-densely connected cnn for multi-modal image segmentation. IEEE transactions on medical imaging 38 (5), pp. 1116–1126. Cited by: §2.
-  (1998) Non-rigid multimodal image registration using mutual information. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 1099–1106. Cited by: §2.
-  (2013) MRF-based deformable registration and ventilation estimation of lung ct. IEEE transactions on medical imaging 32 (7), pp. 1239–1248. Cited by: §2, §3.2.1, §4.2.
-  (2015) Spatial transformer networks. In Advances in neural information processing systems, pp. 2017–2025. Cited by: §2.
-  (2019) Screening for pancreatic cancer. Jama 322 (5), pp. 407–408. Cited by: §1.
-  (2019) Deep q learning driven ct pancreas segmentation with geometry-aware u-net. IEEE transactions on medical imaging 38 (8), pp. 1971–1980. Cited by: §2.
-  (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §2, §3.2.1.
-  (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §4.2, §4.3, Table 1.
-  (2019) Unsupervised deformable registration for multi-modal images via disentangled representations. In International Conference on Information Processing in Medical Imaging, pp. 249–261. Cited by: §2.
-  (1998) The correlation ratio as a new similarity measure for multimodal image registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 1115–1124. Cited by: §2.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §4.2, §4.3, Table 1, Table 2.
-  (2015) Deeporgan: multi-level deep convolutional networks for automated pancreas segmentation. In International conference on medical image computing and computer-assisted intervention, pp. 556–564. Cited by: §2.
-  (2016) Spatial aggregation of holistically-nested networks for automated pancreas segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 451–459. Cited by: §2.
-  (2014) Pancreatic adenocarcinoma. New England Journal of Medicine 371 (11), pp. 1039–1049. Cited by: §2.
-  (2009) Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45 (1), pp. S61–S72. Cited by: §2, §2.
-  (2018) Bridging the gap between 2d and 3d organ segmentation with volumetric fusion net. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 445–453. Cited by: §2.
-  (2018) Recurrent saliency transformation network: incorporating multi-stage visual cues for small organ segmentation. In , pp. 8280–8289. Cited by: §2.
-  (2019) Hyper-pairing network for multi-phase pancreatic ductal adenocarcinoma segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 155–163. Cited by: 3rd item, §1, §1, §2, §2, §3.1, §3.2.1, §4.1, §4.3, Table 2.
-  (2017) A fixed-point model for pancreas segmentation in abdominal ct scans. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 693–701. Cited by: §2.
-  (2020) NeurReg: neural registration and its application to image segmentation. In The IEEE Winter Conference on Applications of Computer Vision, pp. 3617–3626. Cited by: §2.
-  (2018) A 3d coarse-to-fine framework for volumetric medical image segmentation. In 2018 International Conference on 3D Vision (3DV), pp. 682–690. Cited by: §2, §4.3, Table 2.
-  (2019) Multi-scale coarse-to-fine segmentation for screening pancreatic ductal adenocarcinoma. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 3–12. Cited by: 3rd item, §1, §1, §2, §3.1, §4.1, §4.1, §4.3, Table 1.