## 1 Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the 4th most common cancer of death with an overall five-year survival rate of 8%.
Currently, detection or segmentation at localized disease stage followed by complete resection can offer the best chance of survival, *i.e.*, with a 5-year survival rate of 32%. The accurate segmentation of PDAC mass is also important for further quantitative analysis, *e.g.*, survival prediction [1].
Computed tomography (CT) is the most commonly used imaging modality for the initial evaluation of PDAC. However, textures of PDAC on CT are very subtle (Fig. 1) and therefore can be easily neglected by even experienced radiologists.
To our best knowledge, the state-of-the-art on this matter is [20], which reports an average Dice of .
For better detection of PDAC mass, dual-phase pancreas protocol using contrast-enhanced CT imaging, which
is comprised of arterial and venous phases with intravenous contrast delay, are recommended.

In recent years, deep learning has largely advanced the field of computer-aided diagnosis (CAD), especially in the field of biomedical image segmentation

[4, 10, 11, 18, 19, 13]. However, there are several challenges for applying existing segmentation algorithms to dual-phase images. First, segmentation of pancreatic lesion,*e.g.*, cysts [17], is more difficult than organ segmentation due to its smaller sizes, lower contrast and texture similarity,

*etc*. Secondly, these algorithms are optimized for segmenting only one type of input, and therefore cannot be directly applied to handle multi-phase data. More importantly, how to properly handle the variations between different views requires a smart information exchange strategy between different phases. While how to efficiently integrate information from multi-modalities has been widely studied [3, 6, 16], the direction on learning multi-phase information has been rarely explored, especially for tumor detection and segmentation purposes.

To address these challenges, we propose a multi-phase segmentation algorithm, Hyper-Pairing Network (HPN), to enhance the segmentation performance especially for pancreatic abnormality. Following HyperDenseNet [3] which is effective on multi-modal image segmentation, we construct a dual-path network for handling multi-phase data, where each path is intended for one phase.
To enable information exchange between different phases, we apply skip connections across different paths of the network [3], referred as *hyper-connections*.
Moreover, noticing that a standard segmentation loss (cross-entropy loss, Dice loss [8]

) only aims at minimizing the differences between the final prediction and the groundtruth thus cannot well handle the variance between different views, we introduce an additional

*pairing loss term*to encourage the commonality between high-level features across both phases for better incorporation of multi-phase information. We exploit three structures together in HPN including PDAC mass, normal panreatic tissues, and pancreatic duct, which serves as an important clue for localizing PDAC. Extensive experiments demonstrate that the proposed HPN significantly outperforms prior arts by a large margin on all 3 targets.

## 2 Methodology

We hereby focus on dual-phase inputs while our approach can be generalized to multi-phase scans. With phase A and aligned phase B by the deformable registration, we have the set , where is the -th 3D volumetric CT images of phase A with the dimension and is the corresponding aligned volume of phase B. denotes the corresponding voxel-wise label map of the -th volume, where is the label of the -th voxel in the -th image, and denotes the label of the target structures. In this study, ={normal panreatic tissues, PDAC mass, pancreatic duct}. The goal is to learn a model to predict label of each voxel by utilizing multi-phase information.

### 2.1 Hyper-connections

Segmentation networks (*e.g.*, UNet [10, 2], FCN [7]) usually contain a contracting encoder part and a successive expanding
decoder part to produce a full-resolution segmentation result as illustrated in Fig. 2(a). As the layer goes deeper, the output features evolve from low-level detailed representations to high-level abstract semantic representations. The encoder part and the decoder part share an equal number of resolution steps [10, 2].

However, this type of network can only handle single-phase data.
We construct a dual path network where each phase has a branch with a U-shape encoder-decoder architecture as mentioned above. These two branches are connected via hyper-connections which enrich feature representations by learning more complex combinations between the two phases. Specifically, hyper-connections are applied between layers which output feature maps of the same resolution across different paths as illustrated in Fig. 2(b). Let denote the intermediate feature maps of a general segmentation network, where and share the same resolution ( is on the encoder path and is on the decoder path).
Hyper-connections are applied as follows: , , , ,, , while maintaining the original skip connections that already occur within the same path, *i.e.*, , .

### 2.2 Pairing loss

The standard loss for segmentation networks only aims at minimizing the difference between the groundtruth and the final estimation, which cannot well handle the variance between different views. Applying this loss alone is inferior in our situation since the training process involves heavy integration of both arterial information and venous information. To this end, we propose to apply an additional pairing loss, which encourages the commonality between the two sets of high-level semantic representations, to reduce view divergence.

We instantiate this additional objective as a correlation loss [14]. Mathematically, for any pair of aligned images (, ) passing through the corresponding view sub-network, the two sets of high-level semantic representations (feature responses in later layers) corresponding to the two phases are denoted as and , where the two sub-networks are parameterized by and respectively. The outputs of two branches will be simultaneously fed to the final classification layer. In order to better integrate the outcomes from the two branches, we propose to use a pairing loss which exploits the consensus of and during training. The loss is formulated as following:

, | (1) |

where denotes the total number of voxels in the -th sample and denotes the parameters of the entire network. During the training stage, we impose this additional loss to further encourage the commonality between the two intermediate outputs. The overall loss is the weighted sum of this additional penalty term and the standard voxel-wise cross-entropy loss:

, | (2) |

where

denotes the probability of the

-th voxel be classified as label

on the -th sample and is the indicator function.is the total number of classes. The overall objective function is optimized via stochastic gradient descent.

## 3 Experiments

### 3.1 Experiment setup

#### Data acquisition.

This is an institutional review board approved HIPAA compliant retrospective case control study. patients with pathologically proven PDAC were retrospectively identified from the radiology and pathology databases from 2012 to 2017 and the cases with 4cm tumor (PDAC mass) diameter were selected for the experiment. PDAC patients were scanned on a 64-slice multidetector CT scanner (Sensation 64, Siemens Healthineers) or a dual-source multidetector CT scanner (FLASH, Siemens Healthineers). PDAC patients were injected with 100-120 mL of iohexol (Omnipaque, GE Healthcare) at an injection rate of 4-5 mL/sec. Scan protocols were customized for each patient to minimize dose. Arterial phase imaging was performed with bolus triggering, usually 30 seconds post-injection, and venous phase imaging was performed 60 seconds.

#### Evaluation.

Denote and as the set of foreground voxels in the ground-truth and prediction,
*i.e.*, and .
The accuracy of segmentation is evaluated by the Dice-Sørensen coefficient (DSC):
.
We evaluate DSCs of all three targets, *i.e.*, abnormal pancreas, PDAC mass and pancreatic duct. All experiments are conducted by three-fold cross-validation, *i.e.*

, training the models on two folds and testing them on the remaining one. Through our experiment, abnormal pancreas stands for the union of normal pancreatic tissues, PDAC mass and pancreatic duct. The average DSC of all cases as well as the standard deviations are reported.

### 3.2 Implementation details

Our experiments were performed on the whole CT scan and the implementations are based on PyTorch. We adopt a variation of diffeomorphic demons with direction-dependent regularizations

[12, 9] for accurate and efficient deformable registration between the two phases. For data pre-processing, we truncated the raw intensity values within the range [-100, 240] HU and normalized each raw CT case to have zero mean and unit variance. The input sizes of all networks are set as . The coefficient of the correlation loss is set as . No further post-processing strategies were applied.We also used data augmentation during training. Different from single-phase segmentation which commonly uses rotation and scaling [5, 20], virtual sets [15]

are also utilized in this work. Even though arterial and venous phase scanning are customized for each patient, the level of enhancement can be different from patients by variation of blood circulation, which causes inter-subject enhancement variations on each phase. Therefore we construct virtual examples by interpolating between venous and arterial data, similar to

[15]. The -th augmented training sample pair can be written as: where . The final outcome of HPN is obtained by taking the union of predicted regions from models trained with the original paired sets and the virtual paired sets. We set the hyper-parameter following [15].Method | Abnormal pancreas | PDAC mass | pancreatic duct |
---|---|---|---|

3D-UNet-single-phase (Arterial) | 78.35 11.89 | 52.40 27.53 | 38.35 28.98 |

3D-UNet-single-phase (Venous) | 79.61 10.47 | 53.08 27.06 | 40.25 27.89 |

3D-UNet-multi-phase (fusion) | 80.05 10.56 | 52.88 26.97 | 39.06 27.33 |

3D-UNet-multi-phase-HyperNet | 82.45 9.98 | 54.36 26.34 | 43.27 26.33 |

3D-UNet-multi-phase-HyperNet-aug | 83.67 8.92 | 55.72 26.01 | 43.53 25.94 |

3D-UNet-multi-phase-HPN (Ours) | 84.32 8.59 | 57.10 24.76 | 44.93 24.88 |

3D-ResDSN-single-phase (Arterial) | 83.85 9.43 | 56.21 26.33 | 47.04 26.42 |

3D-ResDSN-single-phase (Venous) | 84.92 7.70 | 56.86 26.67 | 49.81 26.23 |

3D-ResDSN-multi-phase (fusion) | 85.52 7.84 | 57.59 26.63 | 48.49 26.37 |

3D-ResDSN-multi-phase-HyperNet | 85.79 8.86 | 60.87 24.95 | 54.18 24.74 |

3D-ResDSN-multi-phase-HyperNet-aug | 85.87 7.91 | 61.69 23.24 | 54.07 24.06 |

3D-ResDSN-multi-HPN (Ours) | 86.65 7.46 | 63.94 22.74 | 56.77 23.33 |

### 3.3 Results and Discussions

All results are summarized in Table 1.
We compare the proposed HPN with the following algorithms: 1) single-phase algorithms which are trained exclusively on one phase (denoted as “single-phase”); 2) multi-phase algorithm where both arterial and venous data are trained using a dual path network bridged with hyper connections (denoted as “HyperNet”).
In general, compared with single-phase algorithms, multi-phase algorithms (*i.e.*, HyperNet, HPN) observe significant improvements for all target structures. It is no surprise to observe such a phenomenon as more useful information is distilled for multi-phase algorithms.

#### Efficacy of hyper-connections.

To show the effectiveness of hyper-connections, output from different phases (using single-phase algorithms) are fused by taking at each position the average probability (denoted as “fusion”). However, we observe that simply fusing the outcomes from different phases usually yield either similar or slightly better performances compared with single-phase algorithms. This indicates that simply fusing the estimations during the inference stage cannot effectively integrate multi-phase information. By contrast, hyper-connections enable the training process to be communicative between the two phase branches and thus can efficiently elevate the performance. Note that directly applying [3] yield unsatisfactory results. Our hyper-connections are not densely connected but are carefully designed based on previous state-of-the-art on PDAC segmentation [20] for better segmentation of PDAC. Meanwhile, we show much better performance of compared to reported in [20].

#### Efficacy of data augmentation.

From Table 1, compared with HyperNet, HyperNet-aug witnesses performance gain especially for PDAC mass (*i.e.*, from to for 3D-ResDSN; from to for 3D-UNet), which validates the usefulness of using virtual paired sets as data augmentation.

#### Efficacy of HPN.

We can observe additional benefit of our HPN over hyperNet-aug (*e.g.*, abnormal pancreas: to , PDAC mass: to , pancreatic duct: to , 3D-ResDSN). Overall, HPN observes an evident improvement compared with HyperNet, *i.e.*, abnormal pancreas: to , PDAC mass: to , pancreatic duct: to (3D-ResDSN). The *p-value*s for testing significant difference between hyperNet and our HPN of all 3 targets are , which suggests a general statistical improvement.
We also show two qualitative examples in Fig. 3, where HPN shows much better segmentation accuracy especially for PDAC mass.

Another noteworthy fact is that cases are false negatives which failed to detect any PDAC mass using either phase (Dice = ). Out of these 11 cases, 7 cases are successfully detected by HPN. An example is shown in Fig. 4 — the PDAC mass is missing from both single phases and almost missing in the original HyperNet (DSC=), but our HPN can detect a reasonable portion of the PDAC mass (DSC=).

The deformable registration error by computing pancreas surface distances between two phases is (mean standard deviations) which can be considered as acceptable for this study. However, the effects between different alignments can be described as a further study.

## 4 Conclusions

Motivated by the fact that radiologists usually rely on analyzing multi-phase data for better image interpretations, we develop an end-to-end framework, HPN, for multi-phase image segmentation.
Specifically, HPN consists of a dual path network where different paths are connected for multi-phase information exchange, and an additional loss is added for removing view divergence.
Extensive experiment results demonstrate that the proposed HPN can substantially and significantly improve the segmentation performance, *i.e.*, HPN reports an improvement up to in terms of DSC compared to prior arts which use single phase data. In the future, we plan to examine the behaviour of HPN when using different alignment strategies and try to extend the current approach to other multi-phase learning problems.

Acknowledgements. This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research. We thank Fengze Liu, Yingda Xia, Qihang Yu and Zhuotun Zhu for instructive discussions.

## References

- [1] (2018) Survival prediction in pancreatic ductal adenocarcinoma by quantitative computed tomography image analysis. Annals of surgical oncology 25. Cited by: §1.
- [2] (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In MICCAI, pp. 424–432. Cited by: §2.1.
- [3] (2018) HyperDense-net: a hyper-densely connected cnn for multi-modal image segmentation. TMI. Cited by: §1, §1, §3.3.
- [4] (2016) Automatic detection of cerebral microbleeds from mr images via 3d convolutional neural networks. TMI 35 (5), pp. 1182–1195. Cited by: §1.
- [5] Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation. arXiv. Cited by: §3.2.
- [6] (2019) Multimodal hyper-connectivity of functional networks using functionally-weighted lasso for mci classification. Medical image analysis 52, pp. 80–96. Cited by: §1.
- [7] (2015) Fully Convolutional Networks for Semantic Segmentation. In CVPR, Cited by: §2.1.
- [8] (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In 3DV, Cited by: §1.
- [9] (2016) MIND demons: symmetric diffeomorphic deformable registration of mr and ct for image-guided spine surgery. TMI 35 (11), pp. 2413–2424. Cited by: §3.2.
- [10] (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, Cited by: §1, §2.1.
- [11] (2016) Spatial Aggregation of Holistically-Nested Networks for Automated Pancreas Segmentation. In MICCAI, Cited by: §1.
- [12] (2009) Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45 (1), pp. S61–S82. Cited by: §3.2.
- [13] (2019) Abdominal multi-organ segmentation with organ-attention networks and statistical fusion. Medical image analysis 55, pp. 88–102. Cited by: §1.
- [14] (2017) Deep correlational learning for survival prediction from multi-modality data. In MICCAI, pp. 406–414. Cited by: §2.2.
- [15] (2018) Mixup: beyond empirical risk minimization. In ICLR, Cited by: §3.2.
- [16] (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, pp. 214–224. Cited by: §1.
- [17] (2017) Deep supervision for pancreatic cyst segmentation in abdominal ct scans. In MICCAI, pp. 222–230. Cited by: §1.
- [18] (2017) A fixed-point model for pancreas segmentation in abdominal ct scans. In MICCAI, pp. 693–701. Cited by: §1.
- [19] (2019) AnatomyNet: deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Medical physics 46 (2), pp. 576–589. Cited by: §1.
- [20] (2018) Multi-scale coarse-to-fine segmentation for screening pancreatic ductal adenocarcinoma. arXiv. Cited by: §1, §3.2, §3.3.

Comments

There are no comments yet.