A Cross-Stitch Architecture for Joint Registration and Segmentation in Adaptive Radiotherapy

04/17/2020 ∙ by Laurens Beljaards, et al. ∙ universiteit leiden Leiden University Medical Center 0

Recently, joint registration and segmentation has been formulated in a deep learning setting, by the definition of joint loss functions. In this work, we investigate joining these tasks at the architectural level. We propose a registration network that integrates segmentation propagation between images, and a segmentation network to predict the segmentation directly. These networks are connected into a single joint architecture via so-called cross-stitch units, allowing information to be exchanged between the tasks in a learnable manner. The proposed method is evaluated in the context of adaptive image-guided radiotherapy, using daily prostate CT imaging. Two datasets from different institutes and manufacturers were involved in the study. The first dataset was used for training (12 patients) and validation (6 patients), while the second dataset was used as an independent test set (14 patients). In terms of mean surface distance, our approach achieved 1.06 ± 0.3 mm, 0.91 ± 0.4 mm, 1.27 ± 0.4 mm, and 1.76 ± 0.8 mm on the validation set and 1.82 ± 2.4 mm, 2.45 ± 2.4 mm, 2.45 ± 5.0 mm, and 2.57 ± 2.3 mm on the test set for the prostate, bladder, seminal vesicles, and rectum, respectively. The proposed multi-task network outperformed single-task networks, as well as a network only joined through the loss function, thus demonstrating the capability to leverage the individual strengths of the segmentation and registration tasks. The obtained performance as well as the inference speed make this a promising candidate for daily re-contouring in adaptive radiotherapy, potentially reducing treatment-related side effects and improving quality-of-life after treatment.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Adaptive image-guided radiation therapy aims to adapt the radiation dose to the daily anatomy of the patient. When executed properly, this may allow the use of smaller safety margins in radiotherapy and thus reduce the exposure of surrounding healthy tissue to radiation, thereby potentially reducing treatment-related side effects and improving quality of life after treatment. To enable such an adaptive treatment cycle it is required to re-image the patient on a daily basis, and subsequently re-contour the tumor and organs-at-risk. Since contouring takes a substantial amount of time by highly qualified radiation oncologists, adaptive treatment by manual procedures is generally infeasible. Thus, automating the procedure is crucial. The two prevalent methods for automatically contouring medical images are image segmentation and contour propagation using registration. In the context of adaptive image-guided radiotherapy, registration-based methods have the advantage of using prior knowledge of the patient’s anatomy in the form of the manually delineated planning scan, and being able to accurately deform low-contrast structures that are hard to identify using nearby higher-contrast structures. Image segmentation has advantages of its own, most notably the ability to accurately contour organs that drastically vary in shape between visits, such as the bladder.

In an attempt to exploit the unique advantages of both methods, approaches for joint registration and segmentation (JRS) have been proposed. Earlier methods performed joint registration and segmentation using for example active contours [Yezzi et al.(2003)Yezzi, Zöllei, and Kapur] or Bayesian models [Pohl et al.(2006)Pohl, Fisher, Grimson, Kikinis, and Wells]

. More recently, convolutional neural networks have become prevalent in medical imaging due to the rapid advancements in machine learning. Registration networks can now match the accuracy of iterative approaches, and segmentation networks have already been found to perform better than their conventional counterparts 

[Litjens et al.(2017)Litjens, Kooi, Bejnordi, Setio, Ciompi, Ghafoorian, van der Laak, van Ginneken, and Sánchez]. Several approaches have been proposed for joint registration and segmentation in combination with convolutional neural networks. In [Xu and Niethammer(2019)], a framework was presented for jointly training registration and segmentation networks. Other approaches have employed generative adversarial networks for joint registration and segmentation [Mahapatra et al.(2018)Mahapatra, Ge, Sedai, and Chakravorty, Elmahdy et al.(2019b)Elmahdy, Wolterink, Sokooti, Išgum, and Staring].

In this work we propose to join registration and segmentation further by merging the two tasks at the architectural level rather than only through the loss function, using concepts from the field of multi-task learning [Ruder(2017)]. In our novel approach, a single neural network is trained to both propagate contours through image registration and generate contours through image segmentation at the same time. We demonstrate that joint architectures outperform single-task segmentation and registration networks, and we show that our approach generates more accurate organ delineations than state-of-the-art methods on both our validation set and an independent test set of prostate CT scans in terms of median MSD.

2 Methods

2.1 Base Network Architecture

We use a 3D deep convolutional neural network derived from the U-Net [Ronneberger et al.(2015)Ronneberger, Fischer, and Brox] and inspired by [Fan et al.(2018)Fan, Cao, Yap, and Shen] as a base architecture for all networks presented in this paper. The network uses

convolution layers without padding. LeakyReLU and batch normalization are applied after each convolution layer. Strided convolutions are used for downsampling in the contracting path, and upsampling layers are used for the expansive path. The network has three output resolutions and is deeply supervised at each resolution. At the lower resolution the network can focus on large organs or large deformations and vice versa at higher resolutions. Given an input patch of size

, the sizes of the high, middle, and low resolution patches are , , and , respectively, where is used in this paper. A detailed schematic is given in the appendix.

Figure 1: The inputs, architecture, outputs, and the losses of the fully hard parameter sharing network. Here, S stands for the segmentation layer, R for the registration layer, and S+R for shared layer. Only the highest resolution convolution layers and outputs are shown here for the sake of clarity.

2.2 Single-Task Segmentation and Registration Networks

Single-task segmentation and registration networks were trained to serve as a baseline for the performance of the proposed joint networks. These networks have identical architectures except for the input layers and output layers. The segmentation network takes the daily CT scan as input, which we refer to as the fixed image , and predicts the corresponding segmentation . The segmentation network is trained using the Dice Similarity Coefficient (DSC) loss, which quantifies the overlap between and the ground truth segmentation . The registration network takes both the planning scan, which we refer to as the moving image , and the daily scan

as input and establishes the correspondence between the two images in the form of a Deformation Vector Field (DVF,

). For this purpose, it is crucial that corresponding anatomical features in the two scans fit inside the network’s field of view, therefore the images have been affinely aligned beforehand. The predicted DVF is then used to warp such that ideally, the warped moving image is identical to . The registration network is trained using the Normalized Cross-Correlation (NCC) loss that quantifies the dissimilarity between and , and the bending energy loss as a regularization term to encourage smoothness of .

2.3 Joining Registration and Segmentation via the Loss

Similar to previous work [Elmahdy et al.(2019b)Elmahdy, Wolterink, Sokooti, Išgum, and Staring], the network in this approach joins registration and segmentation through the loss function. The network is relatively similar to the registration network discussed in the previous section, with the addition that it also takes as input and is jointly trained using a segmentation Dice loss in addition to the NCC and bending energy losses. This Dice loss penalizes discrepancies between the fixed ground truth segmentation and the warped moving segmentation .

Figure 2: The inputs, architecture, outputs, and losses of the cross-stitch network.

2.4 Joint Registration and Segmentation using Hard Parameter Sharing

In this joint network, see Figure 1, the registration and segmentation sub-networks share all their parameters, except for the task-specific convolution layers. Apart from these two layers, the network is architecturally similar to the single-task networks. The network is trained with the Dice loss on the segmentation output (similar to the segmentation network), and the NCC, bending energy, and Dice losses on the registration output (similar to the JRS-registration network). Since the network predicts two segmentation maps, one for each path, the contours from one path can be discarded. A simple strategy is to keep the contours from the path that performed best on the validation set. The segmentations can also be selected on a per-organ basis.

2.5 Joint Registration and Segmentation via a Cross-Stitch Network

We propose to architecturally join 3D Unet-like networks for registration and segmentation by connecting the paths using cross-stitch units [Misra et al.(2016)Misra, Shrivastava, Gupta, and Hebert]. The cross-stitch units linearly combine pairs of feature maps from the segmentation path and the registration path using learnable parameters . Given the segmentation path and the registration path of the joint network, the feature maps of filter in layer – named and respectively – are connected to a cross-stitch unit with learnable parameters , , and . This cross-stitch unit calculates for the segmentation path and

for the registration path. The cross-stitch network has the advantage of being able to learn to strongly share feature maps between the tasks if that is beneficial. Conversely, if it is better for pairs of feature maps to be completely independent, the network can learn the identity matrix to separate those feature maps. This allows representations to be shared between the two paths in a flexible manner, at a negligible cost in terms of number of parameters.

We place the cross-stitch units after the downsampling and upsampling layers, so at four positions in total. This is in line with the original cross-stitch paper, where the authors suggest placing cross-stitch units after every pooling activation map. We found that the number of units is more crucial than their location as long as the units are distributed evenly across the network. For example, placing the cross-stitch units before the downsampling and upsampling layers instead of after them does not alter the performance, but placing a large number of cross-stitch units, such as units after every layer, will degrade the performance of the network. The proposed architecture is visualized in Figure 2.

3 Experiments and Results

3.1 Dataset

This study involves two different datasets from two different institutes and scanners, for patients who underwent intensity-modulated radiotherapy for prostate cancer. The first dataset is from Haukeland Medical Center (HMC), Norway. The dataset has 18 patients with 8-11 CT scans, each corresponding to a treatment fraction. These scans were acquired using a GE scanner, and have 90 to 180 slices with a voxel size of approximately 0.9 0.9 2.0 mm. The second dataset is from Erasmus Medical Center (EMC), The Netherlands. This dataset consists of 14 patients with 3 follow-up CT scans each. The scans were acquired using a Siemens scanner, and have 91 to 218 slices with a voxel size of approximately 0.9 0.9 1.5 mm. The target structures (prostate and seminal vesicles) as well as organs-at-risk (bladder and rectum) were manually delineated by radiation oncologists. The networks were trained and validated on the HMC dataset, while the EMC dataset was used as an independent test set. Training was performed on a subset of 111 image pairs from 12 patients, and validation was carried out on the remaining 50 image pairs from 6 patients. All datasets were resampled to an isotropic voxel size of 1 1 1 mm.

3.2 Implementation and Training Details

We implemented the networks using TensorFlow 

[Abadi et al.(2016)]

. The convolution layers were initialized from a random normal distribution with a mean of 0 and a standard deviation of 0.02, and the trainable alpha parameters of the cross-stitch units were initialized between 0 and 1 from a truncated random normal distribution with a mean of 0.5 and a standard deviation of 0.25. The number of filters was set to {16, 32, 64, 32, 16} for the cross-stitch network and {23, 45, 91, 45, 23} (

times as many) for the other networks in order to ensure that each network has approximately the same number of trainable parameters, namely . The patches were sampled equally from the organs-at-risk, the targets, and the remainder of the abdomen. We used the RAdam [Liu et al.(2019)Liu, Jiang, He, Chen, Liu, Gao, and Han] optimizer with a learning rate of . The networks were trained for 200,000 iterations with an initial batch size of 2. In each batch, the training samples are doubled by switching the role of the fixed and moving patches, resulting in an effective batch size of four. The weights of the Dice and NCC losses were set to 1 and that of the bending energy loss to 0.5. For the total loss, all resolutions are weighted equally, namely each. Training, validation and testing were performed on a Nvidia GTX1080 Ti GPU with 11 GB of memory.

3.3 Evaluation Measures and Comparative Methods

The networks were evaluated in terms of their Mean Surface Distance (MSD) between the predicted segmentations and ground truth contours. The appendix contains results in terms of the DSC and the 95% Hausdorff Distance (HD).

We compare the proposed approach to three state-of-the-art methods in abdominal CT radiotherapy: one iterative method, one deep learning method and one hybrid method.

The inference speed is less than a second for the deep learning methods, and in the order of minutes for the iterative and hybrid approaches.

Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 1.49 2.09 2.73 1.13
Registration 1.29 1.37 2.17 2.71
JRS-Registration 1.13 1.16 1.82 1.90
Fully Hard Sharing Segmentation 1.06 gray gray1.12 1.64 0.87
Registration gray gray1.11 1.10 gray gray1.85 gray gray1.90
Cross-Stitch Segmentation 0.99 gray1.15 1.47 0.82
Registration gray gray1.06 gray 1.13 gray gray1.75 gray gray1.81
Elastix [Qiao(2017)] 1.59 2.45 3.50 4.72
JRS-GAN [Elmahdy et al.(2019b)Elmahdy, Wolterink, Sokooti, Išgum, and Staring] 1.04 1.44 1.89 1.54
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 1.25 1.32 1.85 1.26
Table 1: MSD (mm) values for the different approaches on the HMC dataset. denotes a significant difference (at ) between the cross-stitch network and the other networks.

3.4 Evaluation of Architectures on the HMC Dataset

Quantitative results are given in Table 1, and example results in Figure 3. The first two rows in Table 1

show the results from the single-task networks in terms of MSD. The registration network works better than the segmentation network on most organs as it essentially uses prior knowledge of the organs of the patient by warping the manually delineated planning scan. The segmentation network performed better on the bladder, since the registration network often had trouble establishing a correspondence between the bladder in the fixed image and the moving image as this organ tends to deform considerably between visits. The segmentation network failed to classify any voxel as seminal vesicles in 5 cases. The seminal vesicles are hard to identify because of their small size and poor contrast, which explains the relatively poor performance of the segmentation network on this organ. The registration network has the benefit of being able to use the context, namely the surrounding anatomical features and organs, to more accurately warp the seminal vesicles into place.

The results from the loss-joined JRS-registration network are shown in the third row of Table 1. It is clear that the additional segmentation loss during training improves the registration quality significantly.

The fourth and fifth rows in Table 1 show the results of the fully-hard parameter sharing network. The contours from its segmentation path see substantial improvements in accuracy over the contours from the segmentation network. The registration path yields improvements over the single-task registration network, but it does not improve over the JRS-registration network. These results demonstrate that architecturally joining segmentation and registration can be very beneficial for the segmentation output and can yield more accurate segmentations than either of the single-task networks.

The cross-stitch network performs the best of all networks, as demonstrated by the results in Table 1. Both the segmentation path and the registration path improve over the corresponding paths of the hard parameter sharing network, though it is again the segmentation path that typically yields the most accurate contours. The proposed joint networks, particularly the cross-stitch network, yield significantly better contours than any of the state-of-the-art methods. These results confirm the effectiveness of architecturally joining registration and segmentation for generating accurate organ delineations.

Figure 3:

Example contours generated by the single-task networks and the cross-stitch network on the HMC dataset. From left to right, the selected cases are the first, second and third quantile in terms of prostate MSD of the cross-stitch network.

3.5 Evaluation on the Independent EMC Test Set

Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 2.57 5.82 5.18 1.50
Registration 1.18 1.18 2.23 4.44
JRS-Registration 1.16 1.07 2.14 3.14
Fully Hard Sharing Segmentation gray gray1.34 gray gray1.98 gray 2.10 1.38
Registration 1.20 1.12 gray2.24 gray gray2.84
Cross-Stitch Segmentation gray gray1.21 gray gray1.42 gray gray2.18 1.24
Registration 1.09 1.02 2.10 gray gray2.69
Elastix [Qiao(2017)] 1.17 1.24 3.07 3.27
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 1.36 1.22 2.36 2.26
Table 2: MSD (mm) values for the different approaches on the independent EMC test set. denotes a significant difference (at ) between the cross-stitch network and the other networks. Results for JRS-GAN are not available for this dataset.

Table 2

provides quantitative results on the independent test set. The segmentation network failed to classify any voxel as seminal vesicles in 5 cases, and the segmentation paths of the fully hard sharing network and the cross-stitch network in 1 case. Note that the deep-learning approaches have not been re-trained nor fine-tuned. Again, the joint networks outperform the single-task networks as well as the state-of-the art methods in terms of the median values that are less influenced by outliers. The mean values are relatively high compared to the median values. This discrepancy can be explained by the intensity variations between the population of the training set and test set causing more outliers.

4 Discussion and Conclusion

In this work, we proposed to architecturally join image registration and segmentation to generate daily organ delineations essential for adaptive image-guided radiotherapy. We experimented with different ways of intertwining registration and segmentation in three-dimensional fully convolutional neural networks, and found that joining the tasks with cross-stitch units works best. Via the cross-stitch units the network learns to exchange information between its registration path and segmentation path. Moreover, we studied the potential bias of the segmentation network by adding , via an experiment for the single-task network where is fed to it alongside . The segmentation network improved over feeding only, however it was still inferior to the cross-stitch network, and therefore it was not included in this paper. The segmentation network without was included instead as it serves as a vanilla segmentation baseline.

Evaluation on a validation set and an independent test set demonstrated that the proposed joining of segmentation and registration significantly outperforms their single-task counterparts. On the validation set the proposed approach outperformed existing methods, sometimes by a margin. On the independent test set existing methods achieved better mean values for the prostate and seminal vesicles, while for the rectum the proposed methods performed better. For the bladder specifically, the single-task segmentation network achieved better mean values than the other networks due to the fact that for the HMC dataset, which was used for training and validation, a bladder filling protocol was in place, meaning that the deformation of the bladder between different visits and planning is not large. However, this is not the case for the EMC dataset, the test set. Since the registration-based networks and joint networks were trained on small bladder deformations, they had trouble with larger deformations. The segmentation network was not affected since it does not depend on the deformation but rather on the underlying texture to segment the bladder. This issue could be relatively easily addressed by including synthetic larger deformations during training or including a few patients from the EMC dataset into the training. Nevertheless, in terms of median values, the proposed method was superior for all the organs even though we did not use domain-specific strategies similar to the ones presented in [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.]. Retraining or fine-tuning for this patient population and scanner type can further improve the results for the proposed methods.

A promising direction for future research is to investigate the addition of a third task to the joint networks, notably the generation of the radiotherapy treatment plan. This may allow the joint networks to generate delineations with favorable dosimetric features. Further investigations could be towards the generalization of the network across different patient populations and scanners. Finally, we hypothesize that the accuracy of the networks could be further improved by including more organs, such as the lymph nodes, as this provides extra guidance.

In conclusion, on the validation set and the independent test set, the proposed approach yielded median mean surface distances around the slice thickness. Our approach achieved mm, mm, mm, and mm on the validation set and mm, mm, mm, and mm on the test set for the prostate, bladder, seminal vesicles, and rectum, respectively. With an inference speed of less than a second, our approach is ideal for generating the daily contours in online adaptive image-guided radiotherapy, and subsequently reducing treatment-related side effects and improve quality-of-life for patients after treatment.

The HMC dataset with contours was collected at Haukeland University Hospital, Bergen, Norway, and was provided to us by responsible oncologist Svein Inge Helle and physicist Liv Bolstad Hysing. The EMC dataset with contours was collected at Erasmus University Medical Center, Rotterdam, The Netherlands, and was provided to us by radiation therapist Luca Incrocci and physicist Mischa Hoogeman. They are gratefully acknowledged.

References

Appendix A of the paper “A Cross-Stitch Architecture for Joint Registration and Segmentation in Adaptive Radiotherapy”

In this appendix we highlight the details of the network architecture as well as detailed results of the proposed method.

a.1 Base Network Architecture

Figure 4: The base architecture used for our networks, a three-dimensional deep convolutional neural network derived from U-Net [Ronneberger et al.(2015)Ronneberger, Fischer, and Brox], inspired by [Fan et al.(2018)Fan, Cao, Yap, and Shen]. The number of filters and output sizes are shown below each layer. The output of each layer is cubic, as the input patches are of size .

a.2 Experimental Results in Terms of DSC and 95%HD

Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 0.84 0.62 0.77 0.93
Registration 0.86 0.68 0.81 0.84
JRS-Registration 0.87 0.72 0.84 0.91
Fully Hard Sharing Segmentation 0.88 0.74 0.86 0.95
Registration gray gray0.87 gray gray0.72 gray gray0.84 gray gray0.90
Cross-Stitch Segmentation 0.88 0.74 0.88 0.95
Registration gray 0.88 gray gray0.73 gray gray0.85 gray gray0.91
Elastix [Qiao(2017)] 0.86 0.53 0.74 0.76
JRS-GAN [Elmahdy et al.(2019b)Elmahdy, Wolterink, Sokooti, Išgum, and Staring] 0.87 0.67 0.83 0.92
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 0.89 0.72 0.87 0.95
Table 3: DSC values for the different approaches on the HMC dataset.
Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 0.77 0.28 0.68 0.93
Registration 0.88 0.74 0.77 0.82
JRS-Registration 0.89 0.79 0.79 0.88
Fully Hard Sharing Segmentation 0.88 gray gray0.65 0.81 0.93
Registration 0.88 0.75 gray gray0.80 gray gray0.87
Cross-Stitch Segmentation 0.89 gray gray0.73 0.81 0.93
Registration 0.89 0.80 gray gray0.80 gray gray0.87
Elastix [Qiao(2017)] 0.91 0.82 0.76 0.87
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 0.89 0.81 0.82 0.90
Table 4: DSC values for the different approaches on the independent EMC test set.
Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 4.4 7.3 13.3 4.0
Registration 4.0 4.3 9.4 12.1
JRS-Registration 3.1 3.7 8.1 10.6
Fully Hard Sharing Segmentation 3.0 gray gray3.6 8.9 3.0
Registration gray gray3.2 3.2 gray gray9.1 gray gray9.7
Cross-Stitch Segmentation 3.0 gray3.9 7.2 2.3
Registration gray 3.0 gray 3.6 gray gray8.6 gray gray9.7
Elastix [Qiao(2017)] 3.7 5.6 9.8 13.6
JRS-GAN [Elmahdy et al.(2019b)Elmahdy, Wolterink, Sokooti, Išgum, and Staring] 3.0 4.6 8.4 7.6
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 2.8 3.1 6.1 3.3
Table 5: 95%HD values for the different approaches on the HMC dataset.
Prostate Seminal vesicles Rectum Bladder
Output Path Median Median Median Median
Segmentation 9.3 15.4 29.0 10.0
Registration 4.2 4.3 12.0 20.2
JRS-Registration 3.2 4.0 12.0 18.6
Fully Hard Sharing Segmentation gray gray4.1 gray gray6.8 gray gray13.6 5.5
Registration 4.0 4.0 13.0 gray gray17.4
Cross-Stitch Segmentation gray gray4.0 gray gray5.0 gray gray14.0 4.4
Registration 3.2 3.3 12.0 gray gray16.2
Elastix [Qiao(2017)] 2.9 3.2 11.3 10.4
Hybrid [Elmahdy et al.(2019a)Elmahdy, Jagt, Zinkstok, Qiao, Shahzad, Sokooti, Yousefi, Incrocci, Marijnen, Hoogeman, and Staring.] 3.4 3.1 8.6 6.6
Table 6: 95%HD values for the different approaches on the independent EMC test set.